| Type: | Package | 
| Title: | Import Texts from Files in the 'Alceste' Format Using the 'tm' Text Mining Framework | 
| Version: | 1.1.2 | 
| Date: | 2025-02-27 | 
| Imports: | NLP, tm (≥ 0.6) | 
| Suggests: | stringi | 
| Description: | Provides a 'tm' Source to create corpora from a corpus prepared in the format used by the 'Alceste' application (i.e. a single text file with inline meta-data). It is able to import both text contents and meta-data (starred) variables. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| URL: | https://github.com/nalimilan/R.TeMiS | 
| BugReports: | https://github.com/nalimilan/R.TeMiS/issues | 
| NeedsCompilation: | no | 
| Packaged: | 2025-02-27 18:19:51 UTC; milan | 
| Author: | Milan Bouchet-Valat [aut, cre] | 
| Maintainer: | Milan Bouchet-Valat <nalimilan@club.fr> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-02-28 09:50:02 UTC | 
A plug-in for the tm text mining framework to import corpora from Alceste files
Description
This package provides a tm Source to create corpora from files formatted in the format used by the Alceste application.
Details
Typical usage is to create a corpus from an Alceste file
prepared manually (here called myAlcesteCorpus.txt).
Frequently, it is necessary to specify the encoding of the texts
via link{AlcesteSource}'s encoding argument.
    # Import corpus
    source <- europresseSource("myAlcesteCorpus.txt")
    corpus <- Corpus(source)
    # See how many articles were imported
    corpus
    # See the contents of the first article and its meta-data
    inspect(corpus[1])
    meta(corpus[[1]])
  
See link{AlcesteSource} for more details and real examples.
Author(s)
Milan Bouchet-Valat <nalimilan@club.fr>
References
https://image-zafar.com/Logicieluk.html
Alceste Source
Description
Construct a source for an input containing a set of texts saved in the Alceste format in a single text file.
Usage
  AlcesteSource(x, encoding = "auto")
Arguments
| x | Either a character identifying the file or a connection. | 
| encoding | A character string: if non-empty declares the encoding
used when reading the file, so the character data can be
re-encoded.  See the ‘Encoding’ section of the help for
 | 
Details
Several texts are saved in a single Alceste-formatted file, separated
by lines starting with “***” or digits, followed by starred
variables (see links below). These variables are set as document
meta-data that can be accessed via the meta function.
Currently, “theme” lines starting with “-*” are ignored.
Value
An object of class AlcesteSource which extends the class
Source representing set of articles from Alceste.
Author(s)
Milan Bouchet-Valat
See Also
https://image-zafar.com/sites/default/files/telechargements/formatage_alceste.pdf (in French) about the Alceste format
readAlceste for the function actually parsing
individual articles.
getSources to list available sources.
Examples
    library(tm)
    file <- system.file("texts", "alceste_test.txt", 
                        package = "tm.plugin.alceste")
    corpus <- Corpus(AlcesteSource(file))
    # See the contents of the documents
    inspect(corpus)
    # See meta-data associated with first article
    meta(corpus[[1]])
Read in a text in the Alceste format
Description
Read in a text in the Alceste format using starred variables.
Usage
  readAlceste(elem, language, id)
Arguments
| elem | A  | 
| language | A  | 
| id | A  | 
Value
A PlainTextDocument with the contents of the article and the available meta-data set.
Author(s)
Milan Bouchet-Valat
See Also
getReaders to list available reader functions.