[00:35] *** hawc145 has joined #archiveteam-bs [00:39] *** HCross has quit IRC (Ping timeout: 370 seconds) [00:50] *** JesseW has joined #archiveteam-bs [00:55] *** BlueMaxim has joined #archiveteam-bs [01:00] *** Cameron_D has quit IRC (Ping timeout: 370 seconds) [01:02] *** phuzion has quit IRC (Ping timeout: 370 seconds) [01:02] *** phuzion has joined #archiveteam-bs [01:07] *** MrRadar has quit IRC (Ping timeout: 370 seconds) [01:19] *** Cameron_D has joined #archiveteam-bs [01:20] *** ErkDog has quit IRC (Read error: Operation timed out) [01:21] *** ErkDog has joined #archiveteam-bs [01:28] *** useretail has quit IRC (Ping timeout: 244 seconds) [01:30] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [01:50] *** ohhdemgir has quit IRC (Ping timeout: 1208 seconds) [01:51] *** JesseW has quit IRC (Ping timeout: 370 seconds) [01:58] *** ohhdemgir has joined #archiveteam-bs [02:02] *** MrRadar has joined #archiveteam-bs [02:03] *** JesseW has joined #archiveteam-bs [02:08] *** dashcloud has quit IRC (Read error: Operation timed out) [02:11] *** dashcloud has joined #archiveteam-bs [02:27] bsmith093: verified that the uploaded MD5s match what I have. [02:27] I think I'm going to delete my copy of the old tarball now. [02:51] *** Stiletto is now known as Stilett0 [03:07] JesseW: i will too, i could use the space [03:14] *** BlueMaxim has quit IRC (Read error: Operation timed out) [03:16] *** BlueMaxim has joined #archiveteam-bs [03:46] since the IA provided torrent won't include the full data, it would probably be good to make a torrent [03:50] JesseW: on it [03:50] :-) [03:50] serious'y, though [03:50] JesseW: Seriously though, thanks for your help. [03:51] *** bwn_ has quit IRC (Ping timeout: 492 seconds) [03:53] You are very welcome! I still need to combine the csvs into an sqlite db. [03:53] If you want to hack up a shell command to do that, I'd be grateful. [03:54] There's one csv, called inventory.csv, in each directory -- I need to run the sqlite import on each one. [03:55] JesseW: maybe a for loop [03:55] JesseW: at least something to get them all in one idirectory, named for the folder they were a database *of* [03:56] JesseW: also, throw that in the fos directory when it's done, i'd want to include that for analysis. [03:56] :) [03:59] sure [04:06] JesseW: Possibly a very stupid question, but can you tell a for loop to go alphabetically? is that a thing [04:06] *** toad2 has quit IRC (Read error: Operation timed out) [04:07] JesseW: it might be faster to jsut import one by one, and use the command history to speed it along. [04:08] *** toad1 has joined #archiveteam-bs [04:10] bsmith093: pipe the input to the for loop via sort-command [04:10] bsmith093: do you have an example? [04:13] Atluxity: trying to import 30 something csv files, into one sql db file, recursively. [04:13] well, I think the `find` command is probably the right answer for finding the csvs [04:13] and it's not 30. [04:13] all in differetn sub folders of the same path [04:13] It's more like thousands [04:13] one per *folder* [04:13] JesseW: yes, per A B C folder, right? [04:14] nooo [04:14] *** toad1 has quit IRC (Read error: Operation timed out) [04:14] one per folder in Fanfiction [04:14] wait, you went with CATEGORIES?! oy [04:14] it was the easiest way to generate them [04:15] so yeah, many thousands [04:16] so yeah, maybe " for *.csv in $file ; do sql3 import $file.csv final.db [04:16] i've found several gui tools that do this, such as razor sql, free 30 day trial, full functions [04:17] 89,472 to be specific. [04:17] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:17] well, sqlite has .import [04:18] *** toad1 has joined #archiveteam-bs [04:24] *** Sk1d has joined #archiveteam-bs [04:26] JesseW: here http://pastebin.com/DQD5h0Li found this [04:27] thx [04:29] if you have php, and a for loop for the files, that should work [04:31] turns out this is a rather edge case problem! [04:31] heh [04:32] I'm actually going to do it this way: cat baz | awk '{print ".import \""$0"\" metadata\nselect \""$0"\", count(*) from metadata;"}' | sqlite3 -csv metadata.sqlite [04:32] (baz contains a list of paths to the csvs) [04:34] JesseW: i know nothing about awk, and regex(?) scares the pants off me. [04:34] :-) [04:34] no regexes in there, actually [04:34] thats actually the simplest-looking regex i've ever seen! [04:35] up to 13,000 [04:35] ah, well thats why, then. [04:36] how many files is it in total, again? [04:36] why not just one huge csv-file-list fiel?> [04:36] *file ? [04:36] JesseW: wc -l baz [04:36] Atluxity: that's a count of categories, not individual files. [04:36] ah [04:37] I have a list of files, but it takes long enough to *run* wc -l that I thought I'd ask bsmith093 to run it again instead of me. :-) [04:37] man, everything about this project is huge, isn't it? [04:37] :) [04:37] big data baby [04:37] eh, huge-*ish* [04:37] JesseW: um, run what, you have the csv's [04:38] just a count of inventory.txt [04:43] up to 257,265 [04:43] JesseW: i had to rebuild it, because i forgot to add Fanfiction_misc.zip [04:44] bsmith093: could you update the description on https://archive.org/details/FanfictionNearlyCompleteArchive to link to the repack? [04:44] probably a faster way, but i figued why take chances [04:44] rebuild what, the torrent? [04:45] it looks like there are less than 10 million stories [04:46] JesseW: no, the inventory file. [04:47] ah, I see. [04:47] * JesseW is now reading the Minesweeper fanfic [04:48] JesseW: 6,845,581 lines. [04:48] JesseW: read the zoo tycoon fanfic, bring brain bleach. [04:49] ha [04:50] well if there are less than 7 million, then I'm about 7% done [04:50] ~ 567,000 entered [05:03] JesseW: rebuilding md5, btw, is there a way to add a line to an md5sum-generated file, that's very tedious to re-do every time. [05:03] JesseW: also reuploading inventory zip file [05:04] sure, open it in a text editor (like notepad) and paste it in. :-) [05:05] but you don't really need to make the md5.txt file -- IA generates them itself, in _files.xml [05:05] well now i feel really stupid :P [05:06] ;-P [05:06] btw, sudo pip install ia , archive.org cli interface [05:07] wroks great, if you have an account, just get your secret key, for creds, run ia configure [05:07] I know, it's very nice -- I've contributed to it [05:07] oh, right, forgot, great work! [05:07] inventory.zip uploaded. [05:08] i also changed the old tar description to point to this upload. [05:08] JesseW: how gos the sql import? [05:09] 618,000 [05:09] JesseW: roughlt how many per second [05:09] pretty slow [05:11] done about 7,000 of the 89,000 categories [05:11] I think I'm going to turn off the counting [05:12] might speed it up, a bit anyway [05:14] *** BlueMaxim has quit IRC (Read error: Operation timed out) [05:15] *** BlueMaxim has joined #archiveteam-bs [05:20] JesseW: protip, when screenlog.0 is annoying to parse, do screen -S name, then run "script -f logname command" inside it, works much better. [05:21] hm [05:21] yeah, I need to get more familiar with script [05:21] JesseW: -f is flush, it's basically a realtime log file [05:22] removing th count is a LOT faster [05:22] you worked with the python IA package, so you might know, is there a way to verify uploads after the fact? [05:23] what do you mean by "verify"? [05:23] the -v option, i never uploaded the fanfic grab with it, is it too late now to verify? [05:23] https://archive.org/metadata/fanfictiondotnet_repack/files/0 [05:25] https://github.com/jjjake/internetarchive/blob/master/internetarchive/cli/ia_upload.py [05:26] https://github.com/jjjake/internetarchive/blob/master/internetarchive/item.py#L492 [05:27] It's just checking the md5 [05:27] so you can do that after the fact [05:27] see my first link [05:31] *** JetBalsa has quit IRC (Ping timeout: 250 seconds) [05:31] *** JetBalsa has joined #archiveteam-bs [05:37] *** metalcamp has joined #archiveteam-bs [05:40] 25,000 categories done [05:44] *** VADemon has joined #archiveteam-bs [05:45] 31,000 [05:50] whoo, progress! [05:53] 41,000 [05:55] ~38 minutes to go, at current speed [05:57] 45,000 [05:59] JesseW: ~1200/min [05:59] i'm crazy bored, also timestamps rule! [06:02] if you're board, go analyze some url shorteners. :-) [06:03] http://archiveteam.org/index.php?title=URLTeam [06:03] 52,000 [06:05] i just noticed i have neither wireshark or virtualbox on the machine. fixing. [06:06] heh. good things to fix [06:06] it is now working on Harry Potter [06:06] and done with that [06:06] the largest plurality of stories [06:06] damn! [06:06] 55,000 [06:07] I want to clean up my (non-virtual) desk top -- but there's a lack of room to clean it off into. :-/ [06:09] attics are great for that :P [06:10] heh -- sadly, no attic [06:10] JesseW: can i still run the urlteam thing without the warrior? [06:11] 59,000 [06:11] bsmith093: you bet [06:11] most of the big contributors do, I think [06:11] ask Atluxity or johtso about doing so [06:16] *** JetBalsa has quit IRC (Ping timeout: 250 seconds) [06:16] *** JetBalsa has joined #archiveteam-bs [06:17] k urlteam grabber is running in screen, whoo! how often does it phone home wioth its results? [06:17] after each batch, usually 50 items [06:17] ^ [06:19] *** BlueMaxim has quit IRC (Read error: Operation timed out) [06:20] *** BlueMaxim has joined #archiveteam-bs [06:21] 71,000 [06:27] *** JetBalsa has quit IRC (Read error: Operation timed out) [06:28] i'm also running the new fanfic id's through fanficfare, grabbing everything from 10-12 million [06:28] *** JetBalsa has joined #archiveteam-bs [06:29] ls -aR | wc -l returns 71537 files so far [06:30] 956505 id's to go [06:33] nice! [06:33] plus i no longer have the .hack sign problem, that character was added to the unsafe chars list, [06:33] I'm reading through: https://www.fanfiction.net/s/1106180/1/ -- which is quite good [06:33] is it would be the first character, it's now an underscr=ore by default [06:36] JesseW: https://www.fanfiction.net/game/Minesweeper/?&srt=1&lan=1&r=10&len=10 [06:36] 2 minesweeper stories over 10K words [06:36] you are reading the other one [06:36] heh [06:37] finished making the database [06:37] now counting items in it [06:39] bet your ass i'm putting both of those minesweeper stories into calibre [06:40] :-) [06:41] apparently we aren't the only ones to like it: " and for that one Minesweeper fic I wrote years ago that got kind of famous. " [06:41] http://www.whoaisnotme.net/anakinmcfly/fanfic.htm [06:41] also seriously, this. Easily the 3rd or 4th most amazing thing I've ever read, given the source material. https://www.fanfiction.net/s/10983213/1/The-True-Love-Loophole [06:43] 6,704,321 in the database [06:43] 4.7GB [06:43] sending it up to FOS now. [06:45] probably about 20 minutes [06:47] *** Honno has joined #archiveteam-bs [06:50] JesseW: i found some random fanfics you might find hilarious [06:50] are they not online? [06:50] most of them aren't [06:51] toss them on FOS -- my IRC client doesn't like file transfers [06:51] thanks, though [06:51] that explains a lot [06:52] annnnd done! [06:52] in fanfic repack, for consistency [06:53] nods [06:54] 9 minutes for the db [06:55] now I'm checking if there are any fics with over 200 chapters [06:56] *** BlueMaxim has quit IRC (Read error: Operation timed out) [06:58] *** BlueMaxim has joined #archiveteam-bs [06:58] there are [07:00] The one with the most chapters is CentiStories, with 985. [07:01] the matrix fanfic in that folder i sent, is from the agent's perpesctive. [07:01] somewhere in that mess of stories is a 15 MB fanfic [07:01] thats not a typo [07:01] heh [07:02] also this https://www.fanfiction.net/s/4112682/1/The-Subspace-Emissary-s-Worlds-Conquest [07:02] 4 entries with weird values for Rating. [07:03] The rest are T, K, K+ and M. [07:04] 3 million T, one million each of the others [07:04] what values are weird? [07:05] 3.2 million Completed, 3.4 million In-Progress [07:06] how are you getting this info so fats?! [07:06] fast! [07:06] what acn read this? [07:06] SQL, my dear, SQL! [07:06] gui? [07:06] i have sqliteman and sqlbrowser [07:06] Here are the weird values for Rating: [07:06] Eigentlich G, aber wegen einem Satz, einer zeile aus ei [07:06] PG-13 for language, I suppose [07:06] PG...should be higher, because it's d [07:06] Viol [07:06] uh... just a bit of swearing. just being careful, ya know? [07:07] One each. [07:07] I just use the sqlite3 command shell. [07:07] what are the id numbers for those, they are probably acient [07:07] probably; I'll let you find them. [07:07] i can't type anymore :P [07:08] there are also 79 that I failed to extract anything from. [07:08] totally blank entries [07:08] ??? [07:08] yeah, except for the path. [07:08] select * from metadata where Status = ""; [07:08] will show them [07:08] BTW, the database is now up at FOS [07:09] metadata.sqlite [07:09] in the usual directory [07:09] what files, example? grabbing it already [07:09] 38 minties to go [07:09] dear $deity the typos are multiplying!? [07:14] *** VADemon has quit IRC (Quit: left4dead) [07:19] *** VADemon has joined #archiveteam-bs [07:20] *** VADemon has quit IRC (Client Quit) [07:20] the 79 appear to be empty files. [07:22] *** VADemon has joined #archiveteam-bs [07:23] damn. oh well [07:24] 20 minutes to go grabbing that db. [07:24] *** VADemon has quit IRC (Client Quit) [07:24] solid 2 MB/s though so can't complain, even though it could be going 5 times faster! [07:24] *** VADemon has joined #archiveteam-bs [07:27] *** VADemon_ has joined #archiveteam-bs [07:27] *** VADemon_ has quit IRC (Read error: Connection reset by peer) [07:28] *** VADemon_ has joined #archiveteam-bs [07:29] *** JesseW has quit IRC (Ping timeout: 370 seconds) [07:29] *** VADemon has quit IRC (Ping timeout: 250 seconds) [07:32] *** VADemon_ has quit IRC (Read error: Connection reset by peer) [07:54] *** schbirid has joined #archiveteam-bs [08:00] *** bwn has joined #archiveteam-bs [08:04] *** wyatt8750 has joined #archiveteam-bs [08:04] *** wyatt8740 has quit IRC (Read error: Connection reset by peer) [08:17] *** hawc145 is now known as HCross [08:22] *** bwn has quit IRC (Ping timeout: 1208 seconds) [08:23] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [08:52] *** VADemon has joined #archiveteam-bs [09:05] *** BlueMaxim has quit IRC (Read error: Operation timed out) [09:07] *** BlueMaxim has joined #archiveteam-bs [09:11] *** ohhdemgir has quit IRC (Ping timeout: 260 seconds) [09:33] *** Kaz_ has joined #archiveteam-bs [09:33] *** Kaz has quit IRC (Read error: Operation timed out) [10:04] *** ohhdemgir has joined #archiveteam-bs [10:46] *** Honno has quit IRC (Read error: Connection reset by peer) [10:53] *** Honno has joined #archiveteam-bs [11:01] *** RedType_ has quit IRC (Ping timeout: 258 seconds) [11:33] VADemon, hey you were spot on with the virtualization thing in BIOS, thanks ^^ [11:45] Honno: does it work now? :O [11:45] yeah VADemon [11:45] *** SilSte has quit IRC (Ping timeout: 492 seconds) [11:46] That's fantastic! [11:46] Warrior Helper +1 [11:47] VADemon, if I turn it on, get on the browser, and start running the warrior, then turn off that tab, will the warrior still be archiving? [11:50] Yes, the web page is just for controlling the warrior [11:50] Sweet [11:50] The archiving runs as long as the virtual machine is running [11:50] yea [11:51] o livejournal is being archived huh [11:57] Yeah and "SCRIPTS ONLY" does exclude warriors [12:04] so i found a French magazine called 20 Minutes [12:04] it has pdfs going back to at least 2012 [12:06] *** BlueMaxim has quit IRC (Read error: Operation timed out) [12:16] chfoo, whats the difference between archiveteam_gamemaker_20141118080519.cdx.gz, archiveteam_gamemaker_20141118080519.cdx.idx, gamemaker_20141118080519.megawarc.json.gz and gamemaker_20141118080519.megawarc.warc.os.cdx.gz, for this archive? [12:16] https://archive.org/download/archiveteam_gamemaker_20141118080519 [12:17] For all the other parts as well, they have these files, which is the index thingy I should use? [12:17] I'm using the warc.os.cdx.gz and it seems fine, but it takes ages to load any page so I'm thinking it's not the right index file? [12:28] Honno: the .idx files are indexes for the corresponding .warc files - that is, they contain newline-delineated information about each request/response and where in the warc to find it [12:28] er [12:28] cdx files, sorry [12:29] not sure about the idx [12:29] There are two cdx files there [12:29] Which one should I be using? [12:29] I -think- the megawarc is the right one [12:29] sec [12:29] yeah I think it is as well [12:29] This just takes ages to load a page [12:29] okay, yes [12:29] Honno: so [12:29] Honno: one second\ [12:29] It's a massive warc collection tho, amounts to 600gbish [12:29] 50gb per warc [12:29] cute [12:31] Honno: http://storage2.static.itmages.com/i/16/0331/h_1459427518_7459135_2a03278340.png [12:31] Honno: notice how the first few files match the name of the item [12:31] ie. in the archive.org/details/XXX url [12:32] so it's just the metadata for that [12:32] * ersi drops jaw [12:32] the description, tags, uploader, and so on [12:32] Ah joepie91, thanks yeah, thats what I guessed but I wasn't sure how the archive team/most people upload warcs like this [12:32] Honno: well, that info is added by IA itself automatically :) [12:32] and I don't know how warcs work or how to do anything and aggh [12:32] mhmk [12:32] It's often in MegaWARCs, with WARC's inside [12:32] Honno: only the blue part is what archiveteam added [12:32] Honno: how are you currently trying to use/read the megawarc? [12:33] joepie91, I'm using pywb, storing it as a collection (didn't concat everything) [12:33] should I do that for faster speeds? [12:33] I haven't used pywb, but does it read the index files automatically? (cc ersi) [12:33] yeah [12:33] I think? [12:33] it says it finds my index files [12:33] because then it shouldn't be slow [12:33] Honno: did it identify the .warc.os.cdx.gz? [12:33] ie. the last item [12:33] in that list [12:33] yep [12:34] only those work for it btw [12:34] strange, then it shouldn't be slow [12:34] all the others it just says "no cdx found) [12:34] that would make sense :p [12:34] yeah heh sorry [12:35] but yeah, I haven't used pywb... but if it keeps being slow, then maybe there's a bug? [12:35] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [12:36] joepie91, I don't know anything about warcs really, so I take it they work by "recreating" every page by doing the GET requests for each piece of content that the original site would of done, rather than store static stuff? [12:36] do I make sense haha [12:39] so I just checked, it took 3.5 minutes to load this page http://sandbox.yoyogames.com/games/174569-innoquous-4 [12:39] on the warc file [12:41] *** dashcloud has joined #archiveteam-bs [12:47] Honno: a WARC file is basically just a big file of HTTP requests and responses. when a site is archived,. every request and response is added to the end of the WARC [12:48] Honno: the key is that it stores EVERYTHING [12:48] including HTTP headers [12:48] joepie91, mhm, sweet [12:48] and, in the case of IA's crawler, even DNS requests [12:48] so a WARC viewer can fully recreate the response for a given URL, status code and headers and all [12:48] by reading them out of the WARC fil [12:48] file* [12:49] that's why it's used by IA; it retains all the important metadata, whereas a simple .html file wouldn't [12:49] Yeah [12:49] joepie91, so uh, how can I web scrape from this warc? [12:49] Honno: 'web scrape' in what sense? [12:50] joepie91, look for data in specific html lements of pages, ie "
", and store the text inside [12:50] I can do it with live websites [12:50] ahh [12:51] Honno: well, two options [12:51] But it takes so long to load stuff locally, it seems impractical with warcs [12:51] Honno: the loading time is unrelated to it being a WARC [12:51] lookup in WARCs is very quick if you have an index file [12:51] as it contains the exact positions of every request [12:51] but your options are basically [12:51] 1. use something like pywb, then scrape like a regular site [12:52] Yeah I was trying 1, but alas the loading times [12:52] 2. use a WARC library, read out the WARC file directly and work from that (a bit faster, but also more work) [12:52] * joepie91 wonders who develops pywb anyway [12:52] WARC library joepie91? like IA's warc library? [12:53] Honno: anything that reads WARC in your language of choice [12:53] :P [12:53] joepie91, I got this geneerated from one WARC right http://puu.sh/o0ENB/e5063f18a8.txt [12:53] Honno: anyway, try filing a bug in pywb [12:53] on* [12:53] about the slowness [12:53] but how do I like, get the content of pages? [12:53] it might just be a bug [12:53] mhm I will thanks [12:54] Honno: I know more about WARC as a format than about the existing libraries for reading / writing it, so I'm probably not the best person to ask about how to work with it [12:54] :P [12:54] aight, thanks for all your help :) [13:01] *** RedType has joined #archiveteam-bs [13:18] *** metalcamp has joined #archiveteam-bs [13:21] http://www.theverge.com/2016/3/10/11195370/hot-wheels-pc-restored-patriot-computer [13:22] *** Stilett0 is now known as Stiletto [14:22] Watching videos like ~ https://youtu.be/2RHEaRlJedA?t=290 ~ thinking, 'that stuff should be archived....' -_- [14:37] *** Stiletto has quit IRC (Read error: Connection reset by peer) [14:44] *** Stiletto has joined #archiveteam-bs [15:06] *** Honno has quit IRC (Ping timeout: 492 seconds) [15:24] *** signius has quit IRC (Read error: Operation timed out) [15:32] *** underscor has joined #archiveteam-bs [15:33] *** bsmith093 has quit IRC (Ping timeout: 633 seconds) [15:39] *** underscor has quit IRC (http://www.mibbit.com ajax IRC Client) [15:40] *** undersco2 has joined #archiveteam-bs [15:56] *** marvinw is now known as ivan` [15:58] https://github.com/chfoo/wpull/issues/319 FYI all the WARCs made by grab-site and wpull (concurrency > 1) don't really work in pywb. might be worth looking into if someone is in a python bugfixing mood [15:58] bug submitter wants to know if there is a more working WARC reader, too. the open wayback repo on github didnt seem to have any docs [15:59] https://github.com/iipc/openwayback/wiki oh there it is [16:07] *** JesseW has joined #archiveteam-bs [16:10] *** Honno has joined #archiveteam-bs [16:11] *** Honno has quit IRC (Read error: Connection reset by peer) [16:11] *** Honno has joined #archiveteam-bs [16:23] *** bsmith093 has joined #archiveteam-bs [16:24] *** JetBalsa has quit IRC (Quit: - nbs-irc 2.39 - www.nbs-irc.net -) [16:30] Anyone else running warriors through docker? trying to find out if i'm having gui issues or the warrior isn't taking jobs [16:32] *** signius has joined #archiveteam-bs [16:33] nvm, chrome is bad [16:46] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:54] *** wyatt8750 is now known as wyatt8740 [16:55] *** SimpBrain has quit IRC (Ping timeout: 246 seconds) [17:04] *** JW_work1 has joined #archiveteam-bs [17:09] *** chazchaz has quit IRC (Read error: Operation timed out) [17:10] *** chazchaz has joined #archiveteam-bs [17:11] *** SimpBrain has joined #archiveteam-bs [17:26] *** robink has quit IRC (Ping timeout: 260 seconds) [17:38] *** robink has joined #archiveteam-bs [18:04] https://i.imgur.com/XaZdF6V.jpg [18:04] light++ [18:04] *** bwn has joined #archiveteam-bs [18:06] *** bwn has quit IRC (Client Quit) [18:06] *** bwn has joined #archiveteam-bs [18:21] *** bwn has quit IRC (Read error: Operation timed out) [18:35] FYI [18:35] buncha free books available only today: http://www.versobooks.com/blogs/2575-psst-downloading-isn-t-stealing-for-today [18:47] *** bwn has joined #archiveteam-bs [19:08] joepie91: grab 'em for me? nowhere to keep them rn :P (feckin housemove) [19:22] *** schbirid has quit IRC (Quit: Leaving) [19:32] *** ikreymer has joined #archiveteam-bs [19:47] sqlite is being stupid, nothing i do actually returns anything [19:49] ivan`: re: wpull warcs with concurrency > 1, i found the issue (kind of a stupid bug) and will have a fix for pywb soon [19:49] ivan`: the issue is with the cdx creation, thanks for reporting it, will let you know when an update is out [19:52] also, for anyone interested, after this bugfix release, the next release of pywb (and hopefully WebArchivePlayer) will support Python 3.3+ as well [19:58] bsmith093: semicolons? [19:59] JW_work1: literally the only thing i've managed to do is somehow tell sqlite3 to completely scrub the db file, so i'm redownloading it anyway. [20:00] 2.5 hours to go [20:01] .schema returns nothing. but again i somehow managed to delete the file and replace it with 800 bytes of semi random sql. [20:01] *** JW_work1 has quit IRC (Read error: Operation timed out) [20:02] *** JW_work has joined #archiveteam-bs [20:03] any sql people here? i have a massive sqlite file, that i'm trying to read. .schema returns nothing, literally. and sqlite3 managed to overwrite the db with sql statements. 5gb, just gone. [20:03] :-( [20:04] JW_work: i don't get it. [20:05] and of course my eta is going UP [20:05] well, first you'll have to re-download the database (and probably keep a copy) [20:05] :( [20:05] on it [20:05] next, .schema doesn't need a semicolon after it, but all select statements *do* [20:06] all i did was, in the sqilit3 shell, .open metadata.sqlite [20:06] was that it? [20:06] yeah, that's … not right [20:06] pass the database in on the command line, i.e. sqlite3 metadata.sqlite [20:06] ah, thats where i screwed up. [20:07] still, you'd think open filename, means either create it or open it if it exists. [20:07] yeah, it does seem like that should work [20:08] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [20:09] JW_work: can i do anything with a partial copy, or do i seriously have to wait 3 hours to grab the full thing? [20:09] IDK. you can try it [20:10] JW_work: disk image is malformed, so atleast it's reading it. [20:10] oh well, ill just wait. [20:13] *** JW_work has quit IRC (Quit: Leaving.) [20:23] *** JW_work has joined #archiveteam-bs [20:25] looks like Reddit got an NSL [20:25] *** ikreymer has quit IRC (Quit: Page closed) [20:33] *** JW_work has quit IRC (Quit: Leaving.) [20:34] joepie91: yeah, just saw your link in #archivebot :/ [20:34] *** JW_work has joined #archiveteam-bs [20:36] joepie91: i mean, i'm not surprised, but... L/ [20:47] *** JW_work has quit IRC (Quit: Leaving.) [20:49] *** JW_work has joined #archiveteam-bs [20:49] actually joepie91 reddit has decided not to comment on their removal of warrant canaries [20:50] RedType: reddit has decided to comment on their lack of comment. [20:50] damn spez's comment on treading a fine line [20:50] Even with the canaries, we're treading a fine line. The whole thing is icky, which is why we joined Twitter in pushing back. [20:50] it sounds like it answers the comment above, /but it doesnt actually/ [20:50] i love it [20:51] RedType: https://np.reddit.com/r/announcements/comments/4cqyia/for_your_reading_pleasure_our_2015_transparency/d1kpn4k?context=4 is the most important comment, IMO [20:51] so, for those using DigitalOcean: https://twitter.com/joepie91/status/715642213129175040 [20:51] yeah, that's what i was referring to [20:51] joepie91: glaaad, i migraaaateeeeedddd [20:51] commenting on not commenting on their removal of the warrant canary [20:51] *** JW_work has quit IRC (Client Quit) [20:52] RedType: the removal of the canary is pretty fuckin conclusive, though [20:53] joepie91: a whole penny, wow [20:53] *** JW_work has joined #archiveteam-bs [21:05] xmc: it's not about the penny [21:05] it's about the fact that it's SLA credit [21:05] if it was expired for me, it will almost certainly be expired for everybody [21:05] including people who have a TON of SLA credits [21:05] this is just not okay [21:09] *** RichardG has quit IRC (Read error: Operation timed out) [21:09] *** RichardG has joined #archiveteam-bs [21:21] hm [21:22] i got the $100 credit from github student kit a while back, haven't received an email about that expiring yet though [21:22] ..nvm just unlocked my phone and got the gmail notification [21:25] Kazzy: ooh, 100USD, that could buy you... fuck all, DO are expensive as balls :P [21:26] at $5/mo that lasts a very long time!, still have $16 left of it now [21:27] Kazzy: yeah, but... cirrus.alfiepates.me costs me about £22 a year, so :P [21:30] *** BlueMaxim has joined #archiveteam-bs [21:33] I dropped mine after a year, a .com is like £8/year so no point keeping the .me really [21:33] Kazzy: as in, the server behind it :P [21:33] (yes, i use FQDNs in general conversation. yes, you should too ;) ) [21:33] yeah, i plan to migrate alfiepates.me to alfiepates.com [21:33] ah, fair enough [21:33] run both domains for a year, then stab alfiepates.me in the back [21:34] £22/yr surely doesn't get you much, 512mb/1cpu at some cheap host? [21:34] ah, ramnode [21:34] yeup, ramnode, of course :P [21:34] cheap, decent enough for alfiepates.me and the other things i run on it [21:35] mail.alfiepates.me is on a £8/m ramnode box (spam/AV is apparently memory hungry) [21:35] yes, i run my own email. :P [21:37] tbh, I wouldnt use fancy names for servers [21:37] I use WhatTheThingDoes.domain.tld [21:38] so storage.harrycross.me is storage. newsgrabber.harrycross.me is newsgrabbing etc etc [21:38] HCross: all my servers do all sorts [21:38] ^ [21:38] whereas, like, LP-AP1.networktld is obviously this laptop, etc [21:38] ah, I just set up multiple records pointing at the same thing, makes it easier to get at what I want [21:38] HCross: I do that too :P cname is a wonderful thing [21:38] all my vm's at home are vaguely named correctly [21:39] the server itself gets a hostname, then I cname the other stuff to it [21:39] my computers are named randomly [21:39] although I use GApps for mail, and some fancy anycast DNS setup [21:39] although sickrage turned into /everything to do with content acquisition ever/ [21:39] my services and servers have a many-to-many relationship [21:39] many servers run many services [21:39] so the only reasonable naming scheme is unique names for each server [21:39] that are easily identifiable [21:39] because tasks can be spread across servers [21:39] :p [21:40] so, like, "cirrus" is that box, "stratus" is my other VPS box, "alexandria" is my storage server, and so on. [21:40] My actual machines at home are named n1 through n12. VMs get descriptive names. [21:40] atm, I need. ThisThingIsProbablyUsingALotOfCPU.domain.tld :P [21:40] oh man [21:40] I am so happy with my new lighting [21:40] :D [21:40] :-D [21:40] joepie91: can you actually see now? [21:40] I now have daylight-level illumination [21:41] shoulda got some swish hue lights, joepie91 [21:41] 3x20W LED [21:41] don't think i've touched a light switch in weeks [21:41] Kazzy: um. no? [21:41] "sorry, I can't turn on my lights, my light switch is updating" [21:41] :P [21:41] lol [21:42] LEDs are great, i have a house of friends who never turn off the lights in their front room because they are basically free to leave on [21:42] never happens :< the switch still works [21:42] i am considering the LIFX bulbs, mind. was recommended them on freenode :P [21:42] until it does [21:42] xmc: 60W is still non-negligible [21:42] :p [21:42] also, picture: https://i.imgur.com/XaZdF6V.jpg [21:43] joepie91: sure ... but when you consider the very low cost of power here + not stubbing your toe, it becomes pretty easy to justify [21:43] not bad [21:43] that's two out of three bras [21:43] bars* [21:43] they run alongside my wall [21:43] with one of those stand-alone lamp switches [21:43] power in my city is about $0.05 USD/kWh [21:43] and a cable running along the wall [21:43] because it's temporary [21:46] pictures incoming... [21:47] https://imgur.com/a/To8nP [21:47] such professional [21:47] :p [21:47] slight tangent, anyone in the UK having issues getting onto microsoft.com? cc HCross [21:48] Fine from my Virgin line [21:48] *** Sk2d has joined #archiveteam-bs [21:49] zz, maybe the first time virgin's been better than BT at something? :) [21:50] Fine from M247 in Manchester too [21:52] you got colo there? [21:53] Nah, just a VPS [21:54] *** Sk1d has quit IRC (hub.se irc.du.se) [21:54] *** Boppen has quit IRC (hub.se irc.du.se) [21:58] *** kvieta has quit IRC (Read error: Operation timed out) [21:58] *** antomati_ has joined #archiveteam-bs [21:58] *** swebb sets mode: +o antomati_ [21:58] *** Stiletto is now known as Stilett0 [21:59] *** bwn_ has joined #archiveteam-bs [21:59] i ought to go sleep, night all [21:59] *** phuzion has quit IRC (Read error: Operation timed out) [21:59] *** antomatic has quit IRC (Write error: Broken pipe) [22:01] *** phuzion has joined #archiveteam-bs [22:02] *** beardicus has quit IRC (Read error: Operation timed out) [22:02] *** bwn has quit IRC (Read error: Operation timed out) [22:02] *** ivan` has quit IRC (Ping timeout: 635 seconds) [22:04] *** Honno has quit IRC (Ping timeout: 492 seconds) [22:04] *** beardicus has joined #archiveteam-bs [22:05] *** lysobit has quit IRC (Read error: Operation timed out) [22:09] *** Sk2d is now known as Sk1d [22:10] *** GLaDOS has quit IRC (Ping timeout: 633 seconds) [22:12] *** GLaDOS has joined #archiveteam-bs [22:12] *** lysobit has joined #archiveteam-bs [22:12] *** Stilett0 has quit IRC (Ping timeout: 260 seconds) [22:15] you guys maybe getting some old web based radio shows [22:15] one of them is called Web Talk Guys [22:16] i'm getting shows going back to 2002 [22:16] *** Jonimus has quit IRC (Ping timeout: 633 seconds) [22:19] *** kvieta has joined #archiveteam-bs [22:19] *** kvieta has quit IRC (Excess Flood) [22:20] *** Jonimus has joined #archiveteam-bs [22:20] *** swebb sets mode: +o Jonimus [22:21] *** marvinw has joined #archiveteam-bs [22:26] *** kvieta has joined #archiveteam-bs [22:26] *** kvieta has quit IRC (Excess Flood) [22:27] *** kvieta has joined #archiveteam-bs [22:35] *** BlueMaxim has quit IRC (Read error: Operation timed out) [22:35] *** BlueMaxim has joined #archiveteam-bs [22:43] *** dashcloud has quit IRC (Read error: Operation timed out) [22:48] *** dashcloud has joined #archiveteam-bs [23:11] *** undersco2 has quit IRC (Leaving) [23:32] *** dashcloud has quit IRC (Read error: Operation timed out) [23:35] *** dashcloud has joined #archiveteam-bs [23:47] *** BlueMaxim has quit IRC (Read error: Operation timed out) [23:48] *** BlueMaxim has joined #archiveteam-bs