", and store the text inside [12:50] I can do it with live websites [12:50] ahh [12:51] Honno: well, two options [12:51] But it takes so long to load stuff locally, it seems impractical with warcs [12:51] Honno: the loading time is unrelated to it being a WARC [12:51] lookup in WARCs is very quick if you have an index file [12:51] as it contains the exact positions of every request [12:51] but your options are basically [12:51] 1. use something like pywb, then scrape like a regular site [12:52] Yeah I was trying 1, but alas the loading times [12:52] 2. use a WARC library, read out the WARC file directly and work from that (a bit faster, but also more work) [12:52] * joepie91 wonders who develops pywb anyway [12:52] WARC library joepie91? like IA's warc library? [12:53] Honno: anything that reads WARC in your language of choice [12:53] :P [12:53] joepie91, I got this geneerated from one WARC right http://puu.sh/o0ENB/e5063f18a8.txt [12:53] Honno: anyway, try filing a bug in pywb [12:53] on* [12:53] about the slowness [12:53] but how do I like, get the content of pages? [12:53] it might just be a bug [12:53] mhm I will thanks [12:54] Honno: I know more about WARC as a format than about the existing libraries for reading / writing it, so I'm probably not the best person to ask about how to work with it [12:54] :P [12:54] aight, thanks for all your help :) [13:01] *** RedType has joined #archiveteam-bs [13:18] *** metalcamp has joined #archiveteam-bs [13:21] http://www.theverge.com/2016/3/10/11195370/hot-wheels-pc-restored-patriot-computer [13:22] *** Stilett0 is now known as Stiletto [14:22] Watching videos like ~ https://youtu.be/2RHEaRlJedA?t=290 ~ thinking, 'that stuff should be archived....' -_- [14:37] *** Stiletto has quit IRC (Read error: Connection reset by peer) [14:44] *** Stiletto has joined #archiveteam-bs [15:06] *** Honno has quit IRC (Ping timeout: 492 seconds) [15:24] *** signius has quit IRC (Read error: Operation timed out) [15:32] *** underscor has joined #archiveteam-bs [15:33] *** bsmith093 has quit IRC (Ping timeout: 633 seconds) [15:39] *** underscor has quit IRC (http://www.mibbit.com ajax IRC Client) [15:40] *** undersco2 has joined #archiveteam-bs [15:56] *** marvinw is now known as ivan` [15:58] https://github.com/chfoo/wpull/issues/319 FYI all the WARCs made by grab-site and wpull (concurrency > 1) don't really work in pywb. might be worth looking into if someone is in a python bugfixing mood [15:58] bug submitter wants to know if there is a more working WARC reader, too. the open wayback repo on github didnt seem to have any docs [15:59] https://github.com/iipc/openwayback/wiki oh there it is [16:07] *** JesseW has joined #archiveteam-bs [16:10] *** Honno has joined #archiveteam-bs [16:11] *** Honno has quit IRC (Read error: Connection reset by peer) [16:11] *** Honno has joined #archiveteam-bs [16:23] *** bsmith093 has joined #archiveteam-bs [16:24] *** JetBalsa has quit IRC (Quit: - nbs-irc 2.39 - www.nbs-irc.net -) [16:30] Anyone else running warriors through docker? trying to find out if i'm having gui issues or the warrior isn't taking jobs [16:32] *** signius has joined #archiveteam-bs [16:33] nvm, chrome is bad [16:46] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:54] *** wyatt8750 is now known as wyatt8740 [16:55] *** SimpBrain has quit IRC (Ping timeout: 246 seconds) [17:04] *** JW_work1 has joined #archiveteam-bs [17:09] *** chazchaz has quit IRC (Read error: Operation timed out) [17:10] *** chazchaz has joined #archiveteam-bs [17:11] *** SimpBrain has joined #archiveteam-bs [17:26] *** robink has quit IRC (Ping timeout: 260 seconds) [17:38] *** robink has joined #archiveteam-bs [18:04] https://i.imgur.com/XaZdF6V.jpg [18:04] light++ [18:04] *** bwn has joined #archiveteam-bs [18:06] *** bwn has quit IRC (Client Quit) [18:06] *** bwn has joined #archiveteam-bs [18:21] *** bwn has quit IRC (Read error: Operation timed out) [18:35] FYI [18:35] buncha free books available only today: http://www.versobooks.com/blogs/2575-psst-downloading-isn-t-stealing-for-today [18:47] *** bwn has joined #archiveteam-bs [19:08] joepie91: grab 'em for me? nowhere to keep them rn :P (feckin housemove) [19:22] *** schbirid has quit IRC (Quit: Leaving) [19:32] *** ikreymer has joined #archiveteam-bs [19:47] sqlite is being stupid, nothing i do actually returns anything [19:49] ivan`: re: wpull warcs with concurrency > 1, i found the issue (kind of a stupid bug) and will have a fix for pywb soon [19:49] ivan`: the issue is with the cdx creation, thanks for reporting it, will let you know when an update is out [19:52] also, for anyone interested, after this bugfix release, the next release of pywb (and hopefully WebArchivePlayer) will support Python 3.3+ as well [19:58] bsmith093: semicolons? [19:59] JW_work1: literally the only thing i've managed to do is somehow tell sqlite3 to completely scrub the db file, so i'm redownloading it anyway. [20:00] 2.5 hours to go [20:01] .schema returns nothing. but again i somehow managed to delete the file and replace it with 800 bytes of semi random sql. [20:01] *** JW_work1 has quit IRC (Read error: Operation timed out) [20:02] *** JW_work has joined #archiveteam-bs [20:03] any sql people here? i have a massive sqlite file, that i'm trying to read. .schema returns nothing, literally. and sqlite3 managed to overwrite the db with sql statements. 5gb, just gone. [20:03] :-( [20:04] JW_work: i don't get it. [20:05] and of course my eta is going UP [20:05] well, first you'll have to re-download the database (and probably keep a copy) [20:05] :( [20:05] on it [20:05] next, .schema doesn't need a semicolon after it, but all select statements *do* [20:06] all i did was, in the sqilit3 shell, .open metadata.sqlite [20:06] was that it? [20:06] yeah, that's … not right [20:06] pass the database in on the command line, i.e. sqlite3 metadata.sqlite [20:06] ah, thats where i screwed up. [20:07] still, you'd think open filename, means either create it or open it if it exists. [20:07] yeah, it does seem like that should work [20:08] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [20:09] JW_work: can i do anything with a partial copy, or do i seriously have to wait 3 hours to grab the full thing? [20:09] IDK. you can try it [20:10] JW_work: disk image is malformed, so atleast it's reading it. [20:10] oh well, ill just wait. [20:13] *** JW_work has quit IRC (Quit: Leaving.) [20:23] *** JW_work has joined #archiveteam-bs [20:25] looks like Reddit got an NSL [20:25] *** ikreymer has quit IRC (Quit: Page closed) [20:33] *** JW_work has quit IRC (Quit: Leaving.) [20:34] joepie91: yeah, just saw your link in #archivebot :/ [20:34] *** JW_work has joined #archiveteam-bs [20:36] joepie91: i mean, i'm not surprised, but... L/ [20:47] *** JW_work has quit IRC (Quit: Leaving.) [20:49] *** JW_work has joined #archiveteam-bs [20:49] actually joepie91 reddit has decided not to comment on their removal of warrant canaries [20:50] RedType: reddit has decided to comment on their lack of comment. [20:50] damn spez's comment on treading a fine line [20:50] Even with the canaries, we're treading a fine line. The whole thing is icky, which is why we joined Twitter in pushing back. [20:50] it sounds like it answers the comment above, /but it doesnt actually/ [20:50] i love it [20:51] RedType: https://np.reddit.com/r/announcements/comments/4cqyia/for_your_reading_pleasure_our_2015_transparency/d1kpn4k?context=4 is the most important comment, IMO [20:51] so, for those using DigitalOcean: https://twitter.com/joepie91/status/715642213129175040 [20:51] yeah, that's what i was referring to [20:51] joepie91: glaaad, i migraaaateeeeedddd [20:51] commenting on not commenting on their removal of the warrant canary [20:51] *** JW_work has quit IRC (Client Quit) [20:52] RedType: the removal of the canary is pretty fuckin conclusive, though [20:53] joepie91: a whole penny, wow [20:53] *** JW_work has joined #archiveteam-bs [21:05] xmc: it's not about the penny [21:05] it's about the fact that it's SLA credit [21:05] if it was expired for me, it will almost certainly be expired for everybody [21:05] including people who have a TON of SLA credits [21:05] this is just not okay [21:09] *** RichardG has quit IRC (Read error: Operation timed out) [21:09] *** RichardG has joined #archiveteam-bs [21:21] hm [21:22] i got the $100 credit from github student kit a while back, haven't received an email about that expiring yet though [21:22] ..nvm just unlocked my phone and got the gmail notification [21:25] Kazzy: ooh, 100USD, that could buy you... fuck all, DO are expensive as balls :P [21:26] at $5/mo that lasts a very long time!, still have $16 left of it now [21:27] Kazzy: yeah, but... cirrus.alfiepates.me costs me about £22 a year, so :P [21:30] *** BlueMaxim has joined #archiveteam-bs [21:33] I dropped mine after a year, a .com is like £8/year so no point keeping the .me really [21:33] Kazzy: as in, the server behind it :P [21:33] (yes, i use FQDNs in general conversation. yes, you should too ;) ) [21:33] yeah, i plan to migrate alfiepates.me to alfiepates.com [21:33] ah, fair enough [21:33] run both domains for a year, then stab alfiepates.me in the back [21:34] £22/yr surely doesn't get you much, 512mb/1cpu at some cheap host? [21:34] ah, ramnode [21:34] yeup, ramnode, of course :P [21:34] cheap, decent enough for alfiepates.me and the other things i run on it [21:35] mail.alfiepates.me is on a £8/m ramnode box (spam/AV is apparently memory hungry) [21:35] yes, i run my own email. :P [21:37] tbh, I wouldnt use fancy names for servers [21:37] I use WhatTheThingDoes.domain.tld [21:38] so storage.harrycross.me is storage. newsgrabber.harrycross.me is newsgrabbing etc etc [21:38] HCross: all my servers do all sorts [21:38] ^ [21:38] whereas, like, LP-AP1.networktld is obviously this laptop, etc [21:38] ah, I just set up multiple records pointing at the same thing, makes it easier to get at what I want [21:38] HCross: I do that too :P cname is a wonderful thing [21:38] all my vm's at home are vaguely named correctly [21:39] the server itself gets a hostname, then I cname the other stuff to it [21:39] my computers are named randomly [21:39] although I use GApps for mail, and some fancy anycast DNS setup [21:39] although sickrage turned into /everything to do with content acquisition ever/ [21:39] my services and servers have a many-to-many relationship [21:39] many servers run many services [21:39] so the only reasonable naming scheme is unique names for each server [21:39] that are easily identifiable [21:39] because tasks can be spread across servers [21:39] :p [21:40] so, like, "cirrus" is that box, "stratus" is my other VPS box, "alexandria" is my storage server, and so on. [21:40] My actual machines at home are named n1 through n12. VMs get descriptive names. [21:40] atm, I need. ThisThingIsProbablyUsingALotOfCPU.domain.tld :P [21:40] oh man [21:40] I am so happy with my new lighting [21:40] :D [21:40] :-D [21:40] joepie91: can you actually see now? [21:40] I now have daylight-level illumination [21:41] shoulda got some swish hue lights, joepie91 [21:41] 3x20W LED [21:41] don't think i've touched a light switch in weeks [21:41] Kazzy: um. no? [21:41] "sorry, I can't turn on my lights, my light switch is updating" [21:41] :P [21:41] lol [21:42] LEDs are great, i have a house of friends who never turn off the lights in their front room because they are basically free to leave on [21:42] never happens :< the switch still works [21:42] i am considering the LIFX bulbs, mind. was recommended them on freenode :P [21:42] until it does [21:42] xmc: 60W is still non-negligible [21:42] :p [21:42] also, picture: https://i.imgur.com/XaZdF6V.jpg [21:43] joepie91: sure ... but when you consider the very low cost of power here + not stubbing your toe, it becomes pretty easy to justify [21:43] not bad [21:43] that's two out of three bras [21:43] bars* [21:43] they run alongside my wall [21:43] with one of those stand-alone lamp switches [21:43] power in my city is about $0.05 USD/kWh [21:43] and a cable running along the wall [21:43] because it's temporary [21:46] pictures incoming... [21:47] https://imgur.com/a/To8nP [21:47] such professional [21:47] :p [21:47] slight tangent, anyone in the UK having issues getting onto microsoft.com? cc HCross [21:48] Fine from my Virgin line [21:48] *** Sk2d has joined #archiveteam-bs [21:49] zz, maybe the first time virgin's been better than BT at something? :) [21:50] Fine from M247 in Manchester too [21:52] you got colo there? [21:53] Nah, just a VPS [21:54] *** Sk1d has quit IRC (hub.se irc.du.se) [21:54] *** Boppen has quit IRC (hub.se irc.du.se) [21:58] *** kvieta has quit IRC (Read error: Operation timed out) [21:58] *** antomati_ has joined #archiveteam-bs [21:58] *** swebb sets mode: +o antomati_ [21:58] *** Stiletto is now known as Stilett0 [21:59] *** bwn_ has joined #archiveteam-bs [21:59] i ought to go sleep, night all [21:59] *** phuzion has quit IRC (Read error: Operation timed out) [21:59] *** antomatic has quit IRC (Write error: Broken pipe) [22:01] *** phuzion has joined #archiveteam-bs [22:02] *** beardicus has quit IRC (Read error: Operation timed out) [22:02] *** bwn has quit IRC (Read error: Operation timed out) [22:02] *** ivan` has quit IRC (Ping timeout: 635 seconds) [22:04] *** Honno has quit IRC (Ping timeout: 492 seconds) [22:04] *** beardicus has joined #archiveteam-bs [22:05] *** lysobit has quit IRC (Read error: Operation timed out) [22:09] *** Sk2d is now known as Sk1d [22:10] *** GLaDOS has quit IRC (Ping timeout: 633 seconds) [22:12] *** GLaDOS has joined #archiveteam-bs [22:12] *** lysobit has joined #archiveteam-bs [22:12] *** Stilett0 has quit IRC (Ping timeout: 260 seconds) [22:15] you guys maybe getting some old web based radio shows [22:15] one of them is called Web Talk Guys [22:16] i'm getting shows going back to 2002 [22:16] *** Jonimus has quit IRC (Ping timeout: 633 seconds) [22:19] *** kvieta has joined #archiveteam-bs [22:19] *** kvieta has quit IRC (Excess Flood) [22:20] *** Jonimus has joined #archiveteam-bs [22:20] *** swebb sets mode: +o Jonimus [22:21] *** marvinw has joined #archiveteam-bs [22:26] *** kvieta has joined #archiveteam-bs [22:26] *** kvieta has quit IRC (Excess Flood) [22:27] *** kvieta has joined #archiveteam-bs [22:35] *** BlueMaxim has quit IRC (Read error: Operation timed out) [22:35] *** BlueMaxim has joined #archiveteam-bs [22:43] *** dashcloud has quit IRC (Read error: Operation timed out) [22:48] *** dashcloud has joined #archiveteam-bs [23:11] *** undersco2 has quit IRC (Leaving) [23:32] *** dashcloud has quit IRC (Read error: Operation timed out) [23:35] *** dashcloud has joined #archiveteam-bs [23:47] *** BlueMaxim has quit IRC (Read error: Operation timed out) [23:48] *** BlueMaxim has joined #archiveteam-bs