[00:50] *** logchfoo2 starts logging #archiveteam-bs at Fri Apr 01 00:50:34 2016 [00:50] *** logchfoo2 has joined #archiveteam-bs [00:50] *** midas1 has joined #archiveteam-bs [00:51] *** decay has quit IRC (Read error: Operation timed out) [00:51] *** midas has quit IRC (Read error: Operation timed out) [00:51] Booting your silly program. [00:51] *** lbft_ has joined #archiveteam-bs [00:52] *** achip has quit IRC (Ping timeout: 258 seconds) [00:52] *** lbft has quit IRC (Read error: Operation timed out) [00:52] *** Baljem_ has joined #archiveteam-bs [00:52] *** VADemon has quit IRC (hub.efnet.us irc.Prison.NET) [00:52] *** Baljem has quit IRC (hub.efnet.us irc.Prison.NET) [00:53] So not my pay grade. BUT. [00:54] I exited, went down into VIDJAM, and started VIDJAM.EXE and that let me mess around. [00:55] okay- I'll drop the theater portion, and edit the description to match then [01:08] *** Mayeau is now known as Mayonaise [01:14] *** BlueMaxim has quit IRC (Read error: Operation timed out) [01:15] *** BlueMaxim has joined #archiveteam-bs [01:18] *** LS64 has joined #archiveteam-bs [01:18] whats this channel? [01:18] the off topic duimping ground [01:18] ah okay [01:18] to keep the main channel realtively on topic [01:18] I understand completely [01:19] so yeah fanficfare, and calibre companion + moon reader pro, on android, aremy dream apps combo. they work fantastically. [01:19] swear i'm not a bot, just really like my fanfic to not die. [01:19] moon reader, I do need to buy the pro version [01:19] *** JetBalsa has joined #archiveteam-bs [01:20] sooooo worth it. [01:20] *** useretail has joined #archiveteam-bs [01:20] and I understand, I have a bunch saved, but my phone's screen died and I gotta rip the stuff off before the battery goes too [01:20] reads epubs mobi, everything, and it respects css in epubs! [01:20] *** achip has joined #archiveteam-bs [01:20] that sucks :( [01:20] I was devestated when I found FLAGfic went down [01:21] this is why i started saving fanfiction.net [01:21] yeah, I saw your name listed on the Archive team page for it [01:22] also think I found some reddit posts you did [01:22] because I was googling "fanfiction.net archive" [01:22] a person who should be here in a few hours or so, jesseW, help me massively, the sql db is his baby. pulled the metadata off the individual stories and made that. [01:22] oh wow [01:23] i got relatively internet famous for like a month , because of that. [01:23] hah [01:24] So, I guess the archive hosted on ninjawedding doesn't work? [01:25] I even booted Ubuntu up in a virtual machine and could not get that python app to work [01:27] I appreciate the help though [01:29] i'm pertty sure thats dead. [01:30] well, thats good to know [01:30] seriously the calibre plugin works fantastically. [01:30] sounds like exactly what you want, it even takes a list of links, for bulk adding. [01:31] sounds cool [01:31] but I hope I can find those old ones hopefully [01:31] So, your archive, that's different from the 2012 one? [01:31] i think you might be SOL there. [01:31] https://archive.org/search.php?query=identifier%3A%28archiveteam-fanfiction-warc-*%29&sort=titleSorter [01:31] what link are you using? [01:32] There's that one, and the newer one [01:32] yeah the warc thing is not mine, i'm much more straightforward, a giant sorted pile of txt files. [01:33] ack the metadata download stopped [01:33] fuuu [01:35] wonder if a download manager would help [01:35] you got any recommendations for that? I haven't used one of those in years [01:36] most archive.org items have a link to download via bittorrent [01:37] Unfortunately this item doesn't [01:38] *** Stiletto has joined #archiveteam-bs [01:38] In fact, the torrent for this project appears to omit all archives after "Fanfiction_C" [01:38] at least the one listed on IA [01:39] yeah that torring thing is wierd [01:39] LS64: here https://archive.org/download/fanfictiondotnet_repack [01:39] direct links [01:39] actually yes, jdownloader [01:40] isn't that packed with malware nowadays? [01:40] java based, a little slow to open, but handles basically everything. [01:40] ummm, thats news to me?! [01:41] if you still have that linux vm, pip install ia [01:41] the installer is packed with adware crap [01:41] arcive.org download interface [01:41] BlueMaxim: didnot know that, havent installed it in years [01:42] *** VADemon_ has quit IRC (Quit: left4dead) [01:42] No, I closed it, also downloading via the virtual machine wouldn't be in my best interest atm [01:42] there's a windows binary of wget [01:43] I actually found a clean binary of Jdownloader2 [01:43] google says uget, a crossplatform opensource downloader [01:43] grab the archive.org page link, it will find the links in it. [01:44] ~112gb total [01:44] I only need to worry about the metadata.sqlite to check if its in the archive, right? [01:45] if you speak sql, sure. [01:45] I know a fair bit [01:45] if not, all I need is "s" for now [01:45] op, gotta go for now [01:45] thanks again [01:46] select * from metadata where path is like "search string here" [01:58] JW_work: how do i determine the type of the columns of data? [02:00] pragma table_info(metadata) returns 0|Path||0||0, and i have no idea what that means. [02:09] Another worker joins the war [02:13] *** JesseW has joined #archiveteam-bs [02:16] JesseW: ok, i found an issue, in the sql file, words is text, it should probably be integer. sqlbrowser is doing something, since i changed the datatype, but i have no idea what [02:21] *** tomwsmf-a has joined #archiveteam-bs [02:29] *** godane has quit IRC (Quit: Leaving.) [02:52] bsmith093: you can cast items to numeric like this: cast(column_name as numeric) [02:53] JesseW: i'm trying this first, to remove commas, select words from metadata replace (',',''); [02:53] i have a backup this time. [03:21] good [03:22] (sorry, I keep missing your responses) [03:22] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [03:23] JesseW: np, you have a life. cast keeps throwing syntax errors, and i;'ve tried everything i can think of. [03:24] no, I just have sound off on my IRC client. :-) [03:24] hm, paste your code? [03:25] can sqlite dump a log [03:25] I think so, yeah [03:25] cast('Words' as numeric); [03:25] ok, one thing is don't use single quotes -- use double quotes or nothing [03:25] with and without parenthesis, single quote [03:26] single quotes makes a literal, double quotes makes a column name [03:26] double quoted column names are required if the column name has weird characters in it (e.g. spaces) [03:26] these don't [03:26] sqlite> cast("Words" as numeric); [03:26] Error: near "cast": syntax error [03:26] sqlite> cast "Words" as numeric; [03:26] Error: near "cast": syntax error [03:27] ok, cast is a function -- you need something like: [03:27] select cast(Words as numeric) from metadata; [03:27] that works, but gets the values wrong (it ignores the parts after the commas) [03:27] when that kept happening, i figured it might be easier to remove commas from the words colums... of course it is [03:28] select cast(replace(Words, ',', '') as numeric) from metadata limit 5; [03:28] 1180 19447 ... ok that worked [03:29] :-) [03:29] i took off the limit and its just spewing numbers at me, is it actually updating? [03:29] updating? [03:29] it's selecting, not updating [03:30] note the "select" at the start of the line [03:30] it's printing out the number of words in each story [03:30] you probably don't want to print all of them :-) [03:31] ok, so how do i tell it to replace commas with null, then put that back in the words colums? [03:31] I'm currently running a query to show me the titles of all the stories with more than 100,000 words, ordered by word length. [03:31] i swear this keyboard iis dying. [03:32] You probably *don't* want to modify the data, but if you do, you'd use UPDATE [03:32] i tried that, words > 100000 but it returned nothing becasue commas. [03:32] yeah, my query is: select title, cast(replace(Words, ',', '') as numeric) as w from metadata where w > 100000 order by w [03:32] and it's taking too damm long (probably due to the replace call) [03:33] update cast(replace(Words, ',', '') as numeric) from metadata; [03:33] Error: near "(": syntax error [03:34] yeah, that's not how UPDATE works [03:34] https://www.sqlite.org/lang_update.html [03:34] of course it's not... :( [03:36] seriously, what's with these docs and not giving examples?! [03:40] update metadata set replace( Words, ',', ''); [03:41] this seems to work update metadata set words= replace( Words, ',', ''); [03:48] nods [03:49] at least this only thrashes my disk, not the cpu [03:49] :-) [03:50] I'm putting it in a new column, created with: [03:50] alter table metadata add column word_count integer; [03:50] which will also let me avoid the cast [03:51] i started the update, so i'll just let it finish. [03:51] makes sense [03:52] bsmith093: you may like this one: http://alicorn.elcenia.com/stories/double.shtml (I haven't finished it yet) [03:52] did the same thing with calibre, but at least there fanficfare has a word count plugin it can call [03:54] *** metalcamp has joined #archiveteam-bs [03:54] sqlite is done [03:55] nice, yours was faster [03:55] or started sooner [03:55] JesseW: also count is being weird [03:55] select count( select * from metadata where Words > 1000000)); [03:56] well, Words is still a string, you still need the cast [03:56] argh grumble [03:58] ok what did i screw up this time [03:58] update cast (Words as numeric) from metadata; [03:59] *** JetBalsa has quit IRC (Read error: Connection reset by peer) [04:02] *** dashcloud has quit IRC (Read error: Operation timed out) [04:07] JesseW: ping [04:10] *** dashcloud has joined #archiveteam-bs [04:10] this is the pickiest, most persnickety language i've ever tried to use! i usually atleast know where the syntax errors are after a once over! [04:11] how about this sqlite> update metadata set words= cast(Words as numeric); [04:11] LS64: I don't host anything [04:11] I have no idea what archive you're referring to [04:11] JesseW: and of course that's the one which finally works. [04:12] * yipdw owns ninjawedding.org [04:12] yipdw: LS64 means the tracker you ran for the ffnet grab several years ago. they're looking for a dead story. [04:16] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:23] *** Sk1d has joined #archiveteam-bs [04:34] The largest one, by word count, is "Covered In Chocolate I" [04:35] over 12 million words [04:39] LS64: Behind The Rose does not seem to be listed under Sonic The Hedgehog, sorry. :-( [04:41] JesseW: does this alter table metadata add column word_count integer; actually add the words as a number? [04:41] yep [04:41] i somehow hosed the db AGAIN, so i'm using your thing now. [04:41] well, it adds a new column called "word_count" that will treat whatever you put in it as a number [04:42] update metadata set word_count =words [04:42] then if you run [04:42] update metadata set word_count = replace(Words, ',', ''); [04:42] so close [04:42] it will populate that column with Words, but with commas removed [04:43] ill take it! [04:43] :-) [04:44] also weird thing, sort by words greater than 1000000 top 10, export to csv and tell me if you see "anoubie" followed by gibberish [04:44] ? [04:44] i know right? [04:44] what do you mean "export to csv" [04:45] i found ".out csv" [04:45] limit 10 [04:46] hm [04:47] seperator is the pipe character, not fixed width, and already theres a corner case with multi category listings, such as crossovers [04:50] hm [04:51] see it? or am i crazy? [04:52] I haven't seen what you are describing, no. [04:52] https://0bin.net/paste/3ys-bnn+3kx6V9zh#Hen9FbbUVsgR4s9Rpu96CX-MleFU1rMZ8FJGoM5P/zr [04:54] huh, whatever, anubie maximum ride story. [04:54] yours makes way more sense [04:54] strange [04:55] is there a row indicator for sqlite? [04:55] row indicator? [04:55] like x /6787553 [04:55] https://www.sqlite.org/lang_createtable.html#rowid [04:55] still not sure what you mean [04:56] a progress bar for these long-ass updates. [04:56] ah! not that I know of [04:57] anyway the word_count is still populating [04:57] yeah it takes a while [04:57] what other queries can you think of? [04:57] and its done [04:58] this got me that garbage entry last time [04:58] select * from metadata order by words limit 10; [04:58] hm [04:59] tweak for word_count this time [05:01] https://0bin.net/paste/gALOTFpF6HTou9Ey#aLF7VhWCwMcdbVBlEpMOvQHfCjWbQBuoZp2L0vV3OBn [05:04] ok how am i screwing this up, i dont see words anywhere https://0bin.net/paste/VoKJ7DhdFt+AnmXc#ABkXwnmRRGp-/qQmNqDl0QhLq8t0ZRLyIcm7TA97fXh [05:16] ok, hell with this, could you upload the tweaked db to fos? [05:19] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [05:25] I'll make a few more columns, then upload that. [05:28] *** wp494 has joined #archiveteam-bs [05:29] bsmith093: one thing is column names are *not* case sensitive... :-) [05:39] ok, it works, i forgot there were stories where words= null, for some reason [05:40] JesseW: what colums are you adding? [05:40] well, a chapter_count, most obviously [05:41] that exists, chapter. [05:41] but it's not numeric, same as Words [05:42] ah, right how do i change that again? [05:42] same as was done for Words -- alter table then update [05:43] Publisher is kinda redundant -- it's always the same except for the 79 blank ones. [05:43] but in this case chapter is a test string thats already just a number, that column can't just be directly changed? [05:44] nope, Chapters has commas, too [05:44] damn seriously, there are 1000+ chapter stories?! [05:45] yep, 5 [05:45] one with 1,687 chapters [05:45] url? [05:45] Hetalia - Axis Powers - Spanano - Hola_ I'll Write Letters_ Too [05:46] https://www.fanfiction.net/s/6913564/1/ [05:47] 200k words and 1600+ chapters. wow. [05:50] update metadata set chapter_count = replace(Chapters, ',', ''); [05:50] yep [05:50] doing it now [05:50] it takes a while [05:50] this actually makes sense once i figure out the annoyingly picky syntax [05:51] don't worry, once you get more familiar with it there are whole additional vistas of even *more* picky (and just plain nonsensical) syntax to ... "enjoy". [05:51] SQL is kinda famous for the ... creative ... syntax choices. [05:51] It resembles COBOL more than any other currently popular language, I think. [05:52] for example, why is there an Inner join, and outer join, maybe i just want to merge 2 databases, ever think of that?! [05:52] heheheh [05:52] quoting Dilbert, "I learned about that... in history class" [05:52] don't worry, you can also refer to them as LEFT JOIN and RIGHT JOIN, too [05:52] although they aren't exactly equivalent [05:52] well, some are [05:53] something to do with what the comparision ignores, right? [05:53] somewhat, yeah [05:54] is it possible to delete an item from an upload with the pip ia package? [05:54] yeah, I don't remember how though [05:56] *** Start has quit IRC (Ping timeout: 260 seconds) [05:57] ok i just deleted the db from the repack upload [05:57] apparently its ia delete "identifier" file [05:58] makes sense [05:58] seriously great job on that thing btw :) [05:58] eh, I didn't write most of it -- I just added some functionality to browsing collections (and not to the cli) [05:58] much less overhead and way lessannoying than the browser uploading [05:59] chapter_count's done populating, anything else [05:59] nothing else comes to mind [06:00] k uploading, thanks for all the sql noob question- answering. much appreciated [06:00] happy to help [06:00] SQL is ... entertaining. [06:01] that not the word I'd use [06:01] ok, it's entertaining to *watch* :-P [06:02] not really; but I am glad to help [06:02] i feel like integer sorts should be faster., quicksort is a thing. [06:02] and thanks. [06:02] add an index [06:02] oooh , how? [06:03] https://www.sqlite.org/lang_createindex.html [06:03] ok, in this case, whats the key? [06:04] what are you sorting on? [06:04] that's the key [06:05] you know this db as well as i do, what would make sense? [06:06] word_count [06:06] probably also "story url" [06:06] maybe path [06:07] sqlite> create index sorting on metadata (path, "story URL", word_count) [06:09] doesn't this make the db much bigger? [06:09] no, don't have the key on *all* of them! [06:09] three separate indexes, I think [06:09] it probably does, yeah [06:10] ok ctrl -c that [06:13] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [06:17] *** LS64 has quit IRC (Ping timeout: 268 seconds) [06:17] what's interesting to me is, i've read a little on basic db concepts, and their supposed to br atomic, but how does that work if i break out of an update? [06:18] it reverts the changes [06:19] so thats what the .journal file is for! [06:20] word_count desc, how does that index handle the nulls? [06:24] JesseW: happy April Fool's, i think its past midnight, your time [06:24] *** Start has joined #archiveteam-bs [06:25] thank you. not actually -- still half an hour [06:25] mountain tiem? [06:25] IDK with regard to nulls [06:25] I'm heading to sleep, though. g'night [06:25] greetings from the distant future! [06:25] *** JesseW has left [06:42] *** Honno has joined #archiveteam-bs [07:29] *** schbirid has joined #archiveteam-bs [07:36] *** vitzli has joined #archiveteam-bs [07:36] *** Stiletto is now known as Stilett0 [07:40] *** RichardG_ has joined #archiveteam-bs [07:41] *** RichardG has quit IRC (Ping timeout: 250 seconds) [07:45] *** RichardG has joined #archiveteam-bs [07:47] *** RichardG_ has quit IRC (Ping timeout: 260 seconds) [07:50] *** DFJustin has quit IRC (Read error: Connection reset by peer) [07:50] *** DFJustin has joined #archiveteam-bs [07:50] *** swebb sets mode: +o DFJustin [07:51] *** robink has quit IRC (Remote host closed the connection) [07:59] *** robink has joined #archiveteam-bs [07:59] *** robink has quit IRC (Remote host closed the connection) [08:02] *** robink has joined #archiveteam-bs [08:47] *** acridAxid has quit IRC (Read error: Operation timed out) [08:48] *** bwn_ has quit IRC (Read error: Operation timed out) [08:49] *** yakfish has quit IRC (Read error: Operation timed out) [08:49] *** yakfish has joined #archiveteam-bs [08:49] *** acridAxid has joined #archiveteam-bs [09:01] *** bwn has joined #archiveteam-bs [09:16] *** VADemon has joined #archiveteam-bs [09:39] *** BlueMaxim has quit IRC (Quit: Leaving) [10:03] *** vitzli has quit IRC (Leaving) [11:32] *** wyatt8740 has quit IRC (Read error: Operation timed out) [11:34] *** wyatt8740 has joined #archiveteam-bs [11:40] *** ohhdemgir has quit IRC (Read error: Connection reset by peer) [11:48] *** ohhdemgir has joined #archiveteam-bs [11:55] *** Honno has quit IRC (Read error: Operation timed out) [12:16] *** Fletcher has quit IRC (Ping timeout: 244 seconds) [12:16] *** BnA-Rob1n has quit IRC (Ping timeout: 244 seconds) [12:17] *** Fletcher has joined #archiveteam-bs [12:18] *** Fletcher_ sets mode: +o Fletcher [12:18] *** BnA-Rob1n has joined #archiveteam-bs [12:27] *** bentpins has joined #archiveteam-bs [12:36] with the archiving going on of google code, is there somewhere i can see how big the actual projects returned are? [12:36] like i know the live stats show, but whatabout overall? jsut interested in seeing the biggest projects we've archived so far and who's archived them [12:37] *** beardicus has quit IRC (Read error: Operation timed out) [12:40] *** beardicus has joined #archiveteam-bs [12:43] *** lbft_ has quit IRC (Read error: Operation timed out) [12:45] *** Honno has joined #archiveteam-bs [12:45] *** lbft has joined #archiveteam-bs [12:47] *** Mayonaise has quit IRC (Read error: Operation timed out) [12:49] *** Mayonaise has joined #archiveteam-bs [13:09] *** godane has joined #archiveteam-bs [13:40] *** davidar has quit IRC (Quit: Connection closed for inactivity) [13:45] *** Stilett0 is now known as Stiletto [14:15] *** davidar has joined #archiveteam-bs [15:29] *** phuz has joined #archiveteam-bs [15:30] *** phuzion has quit IRC (Read error: Connection reset by peer) [15:32] *** RichardG has quit IRC (Ping timeout: 499 seconds) [16:04] *** RichardG has joined #archiveteam-bs [16:29] *** RichardG has quit IRC (Ping timeout: 250 seconds) [16:30] *** phuz is now known as phuzion [17:05] *** JetBalsa has joined #archiveteam-bs [17:06] Can someone with good wiki grabbing skills please grab https://www.dokuwiki.org/redstarwiki [17:17] HCross: I archivebotted it at minimum [17:19] ah ok [17:28] done [17:28] Thanks [18:25] *** VADemon has quit IRC (Quit: left4dead) [18:26] *** VADemon has joined #archiveteam-bs [18:28] *** RichardG has joined #archiveteam-bs [18:28] archivebot does badly with dokuwikis I think [18:28] with wikis in general] [18:33] I was just !ao-ing the single page [18:36] *** Stiletto has quit IRC () [18:36] *** Stiletto has joined #archiveteam-bs [18:37] *** RichardG has quit IRC (Ping timeout: 499 seconds) [18:43] *** RichardG has joined #archiveteam-bs [18:47] *** bsmith093 has quit IRC (Ping timeout: 260 seconds) [19:12] *** espes__ has quit IRC (hub.se efnet.portlane.se) [19:12] *** chfoo has quit IRC (hub.se efnet.portlane.se) [19:12] *** ersi has quit IRC (hub.se efnet.portlane.se) [19:12] *** goekesmi has quit IRC (hub.se efnet.portlane.se) [19:12] *** koon has quit IRC (hub.se efnet.portlane.se) [19:12] *** Fletcher_ has quit IRC (hub.se efnet.portlane.se) [19:12] *** espes___ has joined #archiveteam-bs [19:13] *** ring has quit IRC (Read error: Operation timed out) [19:13] *** tomwsmf-a has joined #archiveteam-bs [19:15] *** ring has joined #archiveteam-bs [19:16] *** chfoo0 has joined #archiveteam-bs [19:23] *** koon has joined #archiveteam-bs [19:24] *** bsmith093 has joined #archiveteam-bs [19:26] *** goekesmi_ has joined #archiveteam-bs [19:39] *** ersi has joined #archiveteam-bs [19:39] *** swebb sets mode: +o ersi [19:40] *** bwn has quit IRC (Read error: Operation timed out) [19:51] *** VADemon has quit IRC (Quit: left4dead) [19:58] *** bwn has joined #archiveteam-bs [20:14] *** bwn_ has joined #archiveteam-bs [20:21] *** wvdp has joined #archiveteam-bs [20:23] *** bwn has quit IRC (Read error: Operation timed out) [20:44] *** Honno has quit IRC (Ping timeout: 492 seconds) [21:00] *** BlueMaxim has joined #archiveteam-bs [21:00] *** ersi has quit IRC (Ping timeout: 250 seconds) [21:08] *** koon has quit IRC (hub.se efnet.portlane.se) [21:15] *** vtyl has quit IRC (Ping timeout: 260 seconds) [21:17] *** BlueMaxim has quit IRC (Read error: Operation timed out) [21:18] *** vtyl has joined #archiveteam-bs [21:19] *** bwn_ is now known as bwn [21:24] *** chfoo0 is now known as chfoo [21:33] *** bwn has quit IRC (Read error: Operation timed out) [21:34] *** bwn has joined #archiveteam-bs [21:48] http://www.reuters.com/article/us-usa-cyber-reddit-idUSKCN0WX2YF?feedType=RSS&feedName=technologyNews [21:49] *** Sk1d has quit IRC (hub.se irc.du.se) [21:56] *** bwn has quit IRC (Leaving) [21:57] *** bwn has joined #archiveteam-bs [22:24] *** schbirid has quit IRC (Quit: Leaving) [22:24] *** Boppen has joined #archiveteam-bs [22:29] *** wvdp has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [23:09] *** midas1 is now known as midas [23:24] *** Microguru has quit IRC (Read error: Connection reset by peer) [23:36] *** tomwsmf-a has quit IRC (Read error: Operation timed out)