This is the documentation for NAMECHOP from Semaphore Corporation
This document describes how to use the NAMECHOP program to "chop" name data into component parts like the examples shown here.
NAMECHOP is useful for standardizing mailing lists, especially before using duplicate detection programs like DUPDTECT
While viewing this document, use Find or Search commands to quickly locate any word or phrase of interest.
Contents of this document
Section 1. If You Don't Have Time To Read Documentation...
Section 2. Selecting a Database or Text Input File
Section 3. Specifying the Format of a Fixed-Length Text File
Section 4. Specifying Delimiter Options for a Variable-Length Text File
Section 5. Labeling Database or Fixed-length Text Fields
Section 6. Choosing Variable-length Text Input Fields
Section 7. Choosing Variable-length Text Output Fields
Section 8. Sample Outputs
Section 9. Processing Databases or Fixed-length Text Files
Section 10. Processing Variable-length Text Files
Section 1. If You Don't Have Time To Read Documentation...
Step 1. Start NAMECHOP.EXE running and click OK in the initial dialog box.
Find and open a .DBF file containing names to chop.
Step 2. Use the combo box above the grid to select the field containing the unchopped name.
Step 3. Drag the "First name" and "Last name" labels from the "Unassigned labels" list onto the grid.
Step 4. If you have a backup of the file you are about to update,
click the "Process entire file" button.
Section 2. Selecting a Database or Text Input File
After NAMECHOP starts running as described in Step 1 above, it will display a dialog box asking which kind of file you wish to process.
NAMECHOP processes dBase (.DBF) and fixed-length text files "in place" by directly updating the original file. NAMECHOP processes variable-length delimited text files by reading records from the original file and writing new records out to a new file. Click the appropriate radio button for the type of file you will be processing, then click the OK button to continue.
NAMECHOP will display a standard system dialog for opening a file. Use the dialog to find and open any file you wish to process.
If you open a file with a ".DBF" extension, NAMECHOP assumes the file is a dBase database, and you can skip to Section 5, Labeling Database or Fixed-length Text Fields, because NAMECHOP will automatically know the database's field names.
If you open a file with any extension other than ".DBF", NAMECHOP assumes the file you selected is a text file. If, as described above, you selected the radio button for processing fixed-length text files, NAMECHOP will assume the text file you opened contains fixed-length fields, and NAMECHOP will display a dialog for defining the file's field format, as described in the next section.
If, as described above, you selected the radio button for processing variable-length delimited text files, NAMECHOP will assume the text file you opened contains variable-length delimited fields, and NAMECHOP will ask you to specify the file's delimiters, as described in Section 4.
Section 3. Specifying the Format of a Fixed-Length Text File
When you open a fixed-length text file as described in the previous section, NAMECHOP assumes the file contains records with fixed-length fields. That is, every record has the same number of fields, and each field has a certain fixed size. For example, here's a file of three fixed-length records, where each record contains a company, address, city, ZIP, and state field:
SEMAPHORE CORP207 GRANADA APTOS 95003 CA
SEMAFOUR 207 GRENADA DRRIO DEL MAR95003-5007CA
SEMAFOR CO 207 GRAND DR 95003 CA
Because each field has the same fixed length in every record (creating perfectly aligned columns of data through the entire file), fields can run together, without the need for any special "delimiter" characters to separate fields from one another. Records in a fixed-length text file may or may not be separated by record delimiters, such as carriage return or line feed characters. Although this is a very simple file layout to understand, it's fairly wasteful of space, because much of the file contains long runs of blanks.
Before NAMECHOP can properly update a fixed-length text file, you need to tell NAMECHOP how many fields are in each record, and how big each field is. This is accomplished with a dialog that NAMECHOP displays after you open a fixed-length text file. (To properly use the dialog, you must already know the number of fields and sizes of fields in the fixed-length text file you are processing.)
If each input record is delimited by at least a carriage return, isn't longer than 1,000 characters, and doesn't contain more than 100 fields, NAMECHOP will try to guess the file's field layout. Otherwise, NAMECHOP will use a default layout that assumes the file contains records with ten fields of fifteen characters each. In either case, you'll probably have to adjust the layout as described below.
The NAMECHOP dialog for indicating the layout of a fixed-length text file always displays the file's first five records in a grid. If the correct number of fields and the correct size for each field is properly set, you will see the file's data line up correctly in each grid column, similar to the aligned sample data shown above. If the number of fields or any field size is set incorrectly, or if the file does not contain fixed-length records, the file's data will appear jumbled and not line up properly in each column. If the file contains variable-length delimited records instead of fixed-length records, you will have to cancel the dialog and tell NAMECHOP to open the file as a variable-length file. Otherwise, adjust the number of fields and/or the sizes of fields until the data lines up properly in the dialog's grid as described below.
To increase the number of fields in a record, double-click on a field to split it into two fields at the spot you double-clicked. To change the number of characters in a field, first click on the field, then use the dialog's control to increase or decrease the field width. Decreasing a field's width to zero deletes the field. The maximum width for an individual field is 255 characters.
If a field isn't completely visible in the grid, clicking the partially hidden field will scroll the grid to display as much of the field as possible. Since the first click of a double-click will scroll a partially hidden field, it's best to click a partially hidden field once to bring it completely into view, then double-click the unobscured field to split the field as desired. Otherwise, the second click of a double-click on a partially hidden field will probably not be in the desired place because of the scroll that occurs between clicks.
The keyboard's tab, shift-tab, and arrow keys can be used to change the grid's currently selected field.
Note that input file control characters such as carriage returns and line feeds will display as black boxes in the layout grid. If the file contains such delimiters (typically at the end of each record), set up a field for the delimiters just as you would any other text in the file. For example, a final field of width 2 should be created for the carriage return and line feed at the end of each record, if those control character record delimiters are present in the file.
Because NAMECHOP is only concerned with fields it will be manipulating directly (the input name and the chopped components you wish to output), other contiguous runs of fields that NAMECHOP doesn't care about can be grouped as one large field for convenience. For example, if the file contains a 12-character phone number field alongside an 11-character Social Security number field alongside an 8-character date field, then you can tell NAMECHOP those three fields are simply a single contiguous 31-character field, since NAMECHOP doesn't actually need to manipulate the three fields individually. This can save time setting up the layout for a file with lots of fields.
Once you have the proper number of fields and field sizes set up in the dialog, click the OK button. NAMECHOP will automatically save the file's layout in a "schema" file with an .SCH extension. For example, if you define a layout for file TEST.TXT, NAMECHOP will create a schema file named TEST.SCH. The next time you open TEST.TXT, NAMECHOP will automatically use TEST.SCH to initialize the dialog's layout, so you don't have to manually create the same layout every time you process the file.
See Section 5 for instructions on how to process the contents of a fixed-field text file.
Section 4. Specifying Delimiter Options for a Variable-Length Text File
When you open a variable-length text file as described in Section 2, NAMECHOP assumes the file contains records with variable-length fields. That is, every record has the same number of fields, but each field can be a variable size. Adjacent fields are separated by a "delimiter" character, which can be either a comma or a tab. For example, here's a file of three variable-length comma-delimited records, where each record contains a company, address, city, ZIP, and state field:
SEMAPHORE CORP,207 GRANADA,APTOS,95003,CA
SEMAFOUR,207 GRENADA DR,RIO DEL MAR,95003-5007,CA
SEMAFOR CO,207 GRAND DR,,95003,CA
Because fields can start and end at any position within each record, a comma or a tab character is used to separate fields from one another. Each record in a variable-length text file must end with a carriage return, optionally followed by a line feed character. Because fields are only as long as necessary and don't contain long runs of blanks, this file format is generally much more compact than the fixed-length format described in the previous section.
Fields in variable-length text files may also be "quoted". This is necessary when the field delimiter character (such as a comma) occurs inside the field data itself. For example, here's a file of three variable-length comma-delimited records, where each record contains a name and phone number field, and names can be in the form "first last" or "last, first":
DOE, JOHN,800-555-1212 <-- three fields?
MARY SMITH,911-688-9200 <-- two fields?
SMITH, JOE,408-555-1212 <-- three fields?
Notice how the first and third records in the above example look like they have three fields instead of two, because a comma appears both as a field delimiter at the end of the field and inside the name field itself. To avoid this ambiguity and make it clear whether a character is inside a field or acting as a field delimiter, variable-length text fields can optionally be quoted with single or double quote marks. Here is a comma-delimited file using double quote marks:
"DOE, JOHN","800-555-1212" <-- two fields!
"MARY SMITH","911-688-9200" <-- two fields!
"SMITH, JOE","408-555-1212" <-- two fields!
Before NAMECHOP can properly process a variable-length text file, you need to tell NAMECHOP what field delimiters and quoting options the file uses. This is accomplished with a dialog that NAMECHOP displays after you open a variable-length text file. Select the proper options in the dialog to indicate whether the file uses tabs or commas for field separators, and whether fields are quoted with single quote marks, double quote marks, or no quotes at all. Then click the OK button to proceed to the dialog for labeling the file's input fields as described in Section 6.
Section 5. Labeling Database or Fixed-length Text Fields
Once a dBase or fixed-length text file is open and ready for processing, NAMECHOP will display the file's first record in a large, three-column grid. Each row of the grid corresponds to one field in the record. The leftmost column shows the record's field sizes and names, and the middle column shows the contents of each field in the record. The rightmost column is used as described below to indicate how NAMECHOP should process each field in the file. The width of the rightmost two grid columns can be adjusted by clicking and dragging the vertical grid lines between the column headings. Data in the middle column of the grid can be edited by clicking on a field in the column to select it, then typing in the changes you want.
The group of four arrowhead buttons below the grid allow jumping to the first, previous, next, and last database records. The Jump button can be used to jump to any particular record in the file. NAMECHOP displays and processes every record in a dBase file, whether or not records are flagged for deletion by the database software. If the database contains records flagged for deletion, use your database software to "pack" the file and remove the deleted records before processing the file with NAMECHOP.
Use the combobox above the grid to tell NAMECHOP which input field contains the unchopped name. NAMECHOP also needs to know which database fields will be used to save the chopped name components. This is accomplished by dragging labels from the list box titled Unassigned labels. Drop any label on any database row to indicate NAMECHOP should save the given name component in that field. You can drag-and-drop as many labels as you wish.
When you drop a label on the grid, it appears with a left-pointing arrow. For example, if a grid row is marked "<-- First name", then the first name extracted by NAMECHOP from the unchopped name will be saved in that field.
NAMECHOP treats every field in a dBase file as a "character" field, regardless of the actual type (such as date, numeric, or logical) defined for the field by the database software. Also, NAMECHOP does not re-index dBase files. If you change the contents of an indexed dBase field with NAMECHOP, you should re-index the file with your database software after NAMECHOP processing.
A label can be dropped on the same database field containing the unchopped name (although this technique is not recommended unless you have a backup of the file being processed, since your original data will be overwritten and lost forever).
To remove a label dropped on the grid, double-click the label, or drag an unused label on top of the grid's label to replace it.
Use the Save assignments button to save the current configuration of labels in a file with a .CHP extension. Then the Load assignments button can be used to load the configuration of labels from the same .CHP file, to avoid having to manually drag labels each time a file with the same format is processed.
The Find button can be used to perform arbitrary searches for data.
Section 8 describes chopped name components in more detail. See Section 9 for instructions on how to actually process a file.
Section 6. Choosing Variable-length Text Input Fields
Once a variable-length text file is open and ready for processing, NAMECHOP will display the file's first record in a large, two-column grid. Each row of the grid corresponds to one field in the record. The left column shows the record's field names, which by default are initially labeled "Field 1", "Field 2", and so on. The right column shows the contents of each field in the record.
The two arrowhead buttons below the grid allow stepping to the next file record or "rewinding" back to the first record. The Jump button can be used to jump to any particular record in the file.
Note that if the proper delimiter options were selected when the file was opened as described in Section 4, just the data for each field will appear in a separate grid row, without any surrounding quote marks (otherwise you should cancel the dialog and re-open the text file with the proper delimiter options).
Use the combobox above the grid to tell NAMECHOP which input field contains the unchopped name.
If the first record in the file is a "header" record containing field names (instead of actual data), check the First record is a header (not data) box so that record will be skipped during processing.
Once you have selected the proper input field in the combobox, click OK to proceed to the dialog for selecting the layout of the output file, as described in the next section.
Section 7. Choosing Variable-length Text Output Fields
After you click OK in the dialog for choosing the variable-length text input field described in the previous section, NAMECHOP will display a dialog for choosing the layout of the output file. (Records in variable-length text files can't be updated "in place" like dBase or fixed-length text files, because the arrangement of output fields may be different than the arrangement of input fields in the original file. Therefore, when processing variable-length text files, NAMECHOP always reads the original file as input and creates a completely new text file as the output.)
The dialog contains two list boxes titled Available fields and Fields to output. The Available fields list displays all fields in the original input file plus all available chopped name outputs that NAMECHOP can generate. Section 8 describes the chopped name components in more detail.
The Fields to output list is initially empty. Drag any field from the Available fields list and drop it in the Fields to output list so that field will be included in the new output file. Or, simply double-click an available field to move it to the output list. Dragging and clicking works in both directions between the two list boxes. You can drag-and-drop as many labels as you wish. Use the Reset button to remove all labels from the output list box.
Note that when a label is dropped on the output list, it is inserted at the position it was dropped. Similarly, when an available label is double-clicked, it is inserted in the output list before any currently selected item in that list. If no item is selected in the output list, the double-clicked label is appended to the list. When a label is moved back to the list of available fields, that list is resorted.
Use the Save layout button to save the current configuration of output labels in a file with a .CHP extension. Then the Load layout button can be used to load the configuration of labels from the same .CHP file, to avoid dragging labels each time a file with the same format is processed.
Use the Text quoting and Field separator controls to specify the field quoting and delimiter characters the output file will use, like the input delimiter options described in Section 4. Check the Suppress line feed output box if you wish record delimiters to be only a carriage return instead of the standard carriage return and line feed. Check the Create output header record box if you want the first record in the output file to be a "header" record that identifies field names.
If the Retain ending periods box is checked, NAMECHOP leaves any existing periods on the ends of chopped name components. For example, "Mrs. Smith" is returned as "Mrs." and "Smith". If the Retain ending periods box is not checked, NAMECHOP will return "Mrs" and "Smith".
At least one label must be moved from the Available fields list to the Fields to output list before the OK button is enabled and an output file can be created. After you have identified all fields you wish to output, click OK. NAMECHOP will display a dialog asking where to create the output file. Select an appropriate file name and destination for the new output file, then click OK. NAMECHOP will then display a dialog for reading the input file and writing the new output file, as described in Section 10.
Section 8. Sample Outputs
The following table shows some examples of how NAMECHOP chops names into their component parts:
Name Prefix First Middle Last Suffix --------------------- ----------- ------ ------- ---------- ------ Smith Smith Smith Sr. Smith Sr Mrs. Smith Mrs Smith Rev. Smith Jr. Rev Smith Jr Mr. and Mrs. E.Jones Mr and Mrs E Jones Mr. & Mrs. Bix, CPA Mr & Mrs Bix CPA Wilson, Mr & Mrs Jim Mr & Mrs Jim Wilson J. J.Johnson V J J Johnson V Sir T. S. Eliot Sir T S Eliot e e cummings, IV e e cummings IV ee cummings ee cummings Lt. Gen. C James Phd Lt Gen C James Phd W.E.B. DuBois W E B DuBois Du Pont, Jackie Jackie Du Pont Clyde Smith-Jones Clyde Smith-Jones Mike O'Donnell Mike O'Donnell O'Donnell, Mike Mike O'Donnell Jimmy Mac Donald Jimmy Mac Donald Mr. A. E. Von Sturm Mr A E Von Sturm Ms. Beverly D'Angelo Ms Beverly D'Angelo
The above examples assume the Retain ending periods box was not checked.
When processing dBase or fixed-length text files, NAMECHOP allows you to overwrite the original input name with one of the chopped output fields. This is not recommended, since the original data is then lost forever. A better technique is to save the chopped name components in fields separate from the original input name.
IMPORTANT:
When processing dBase or fixed-length text files, a field used to save a chopped name component must be wide enough to accept the data for that field, otherwise NAMECHOP will display a warning message when it attempts to save data that is too long for the destination field.
NAMECHOP does not re-index dBase files. If you change the contents of an indexed dBase field with NAMECHOP, you should re-index the file with your database software after NAMECHOP processing.
Section 9. Processing Databases or Fixed-length Text Files
Before a dBase or fixed-length text file can be processed by NAMECHOP, you must drag-and-drop the at least one output label onto the grid of fields as described in Section 5, otherwise the two "Process..." buttons under the grid remain disabled.
Click the Process this record button to process only the currently displayed record. NAMECHOP will update the record in the database and display any new field data in the grid. This button is normally used just to confirm chopping is working properly before using the Process entire file button.
Click the Process entire file button to automatically process all records in the file. If the grid is not already displaying the first record in the file, NAMECHOP will ask if you want to "rewind" the file back to the first record in the database. Then, beginning with the currently displayed record, NAMECHOP will process each record in the file, as though the Process this record button described above was pressed for each record.
Use the Stop button to cancel file processing. During file processing, the Close command in the System menu is ignored. Once file processing has begun, use the Stop button first, then use the System menu Close command or the Exit button.
If the Announce overflows box is checked, NAMECHOP will pause and display a warning whenever an output is too long to fit in the targeted database field. If the Announce overflows box is not checked, NAMECHOP won't pause and fields are truncated without warning. In either case, NAMECHOP displays the total number of field overflows at the end of file processing.
If the Skip first record box is checked, NAMECHOP assumes the first record in the file is only a header record containing field names (not actual data) and therefore ignores that record when the Process entire file button is used. If the Skip first record box is not checked, the Process entire file button processes all records.
If the Hide grid during runs box is checked, NAMECHOP will hide the current record grid and run faster when processing all records in a file because less display updating will be necessary. If the Hide grid during runs box is not checked, the current record grid will always be visible while processing all records in a file, but NAMECHOP will run slower because more display updating will be necessary.
If the Retain ending periods box is checked, NAMECHOP leaves any existing periods on the ends of chopped name components. For example, "Mrs. Smith" is returned as "Mrs." and "Smith". If the Retain ending periods box is not checked, NAMECHOP will return "Mrs" and "Smith".
Section 10. Processing Variable-length Text Files
After selecting output fields and the output file name as described in Section 7, NAMECHOP displays a dialog for controlling variable-length text file processing.
Click the Go button to automatically process all records in the file. If the current input record is not the first record in the file, NAMECHOP will ask if you want to "rewind" the file back to the first record in the database. Then, beginning with the current input record, NAMECHOP will process each record and write the selected output fields to the new output file.
The Jump button can be used to jump to any particular record in the input file before using the Go button.
Use the Stop button to cancel file processing. During file processing, the Close command in the System menu is ignored. Once file processing has begun, use the Stop button first, then use the System menu Close command or the Exit button.