[09:04] <midas> so, was that a small internet earthquake? [10:54] <schbirid> have some html injection https://startpage.com/do/search?query=Reginald+D.+Hunter&cat=web&pl=chrome&language=english [13:30] <midas> can someone point me in the right direction? i have a list of usernames for home.xmsnet.nl now i just need to figure out how i can get wget to work with this to warc it. [13:34] <midas> right, bad snort. [13:35] <midas> can someone point me in the right direction? i have a list of usernames for home.xmsnet.nl now i just need to figure out how i can get wget to work with this to warc it. [13:49] <SadDM> midas: is there any corelation between the usernames and urls? [13:51] <rocode_> home.xmsnet.nl/username IIRC [13:51] <SadDM> hmm, it looks like the urls are http://home.xmsnet.nl/username [13:51] <SadDM> right [13:52] <rocode> Which version of wget was warc enabled? [13:52] <SadDM> so th easiest thing to do is to make a text file with a list of those urls and then do a wget -i <filename> [13:53] <SadDM> 1.14 maybe? [13:54] <SadDM> yup... >1.14 has warc baked in [13:56] <SadDM> midas: with what I've suggested you'll end up with a warc for every line of the text file... if that's not ultimately desireable, you can megawarc them all together after the fact. [13:58] <rocode> There is only 156 sites. Not too bad. [14:27] <midas> SadDM: sorry, was out of irc for a sec [14:27] <midas> yeah, -i would work with --warc-file ? [14:28] <SadDM> it sure does [14:28] <SadDM> you just end up with one for every url in the file [14:28] <midas> ok, lets start a screen :p [14:28] <midas> thats no issue [14:32] <midas> wget wants a argument for warc-file [14:33] <midas> (i feel stupid right now :p) [14:35] <SadDM> oh... uh [14:35] <SadDM> hmm... maybe I got ahead of myself [14:35] <ohhdemgir> I keep cocking up my media types, wish users could move items :| [14:35] <midas> my guess was just wget -i <file> -m --warc-file but thats no balony [14:35] <SadDM> I know that I've done something like this [14:36] <SadDM> maybe I just built a bunch of wget commands in a file and then piped it through a shell [14:37] <SadDM> sorry midas, I think I've given you slightly bad advice [14:37] <SadDM> are you on a unix machine? [14:40] <midas> yeah [14:40] <SadDM> something like this should generate the wgets: cat usernames.txt|sed 's/\(.*\)/wget -m --warc-file=\1 http:\/\/home.xmsnet.nl/\1//' > wgets [14:41] <SadDM> note that I just wrote that cold and didn't test it or anything [14:41] <SadDM> then just "sh wgets" [14:41] <midas> lol. thanks! i think that would work :) [14:42] <SadDM> :-D it's close-ish at least [14:42] <midas> indeed, and thats more than i could do on short notice :p [14:48] <godane> so i fixed some of the uploads of sydney morning herald and Australian women's weekly [14:48] <godane> some of the uploads were incomplete downloads [14:50] <Smiley> midas: around bud? [14:50] <godane> so i'm mirroring more glenn beck episodes [14:51] <midas> Smiley: yeah [14:51] <midas> im always somewhere [14:51] <Smiley> midas: how are/were you testing cartoon hd on android, [14:51] <Smiley> you running a avd on your system? [14:52] <midas> didnt have time since last week, work said i had to do something :< [14:53] <midas> i think ill debug it tonight again [14:53] <midas> just have to get this grab started [14:54] <Smiley> yeah I'm just trying to setup an AVD so I can just grab stuff rather than grabbing on my phone and pushing accross the network [14:54] <Smiley> ah got it working now :D [14:55] * Smiley waits to see if it loads [14:57] <midas> ah bloody cool Smiley ! [14:58] <Smiley> midas: hmmmmm just sitting on "android" screen atm :/ [15:24] <Smiley> midas: just run android avd [15:24] <Smiley> and create a new avd using a nice h/w [15:26] <Smiley> or at least that's what I'm trying to do. [15:26] <Smiley> it's quite slow. [15:27] <Smiley> as I just told it to create a 5Gb sdcard D: [15:37] <midas> lol [15:37] * midas throws stones at midas [15:46] <Smiley> grrr [15:46] <Smiley> none will boot D: [15:51] <Smiley> yey got one booting \o/ [15:59] <Smiley> and using android monitor, it has a file explorer with push/pull [15:59] <Smiley> or i can use adb. sweeeet [16:45] <yipdw> lol Wheeler [16:45] <yipdw> "Simply put, when a consumer buys a specified bandwidth, it is commercially unreasonable and thus a violation of this proposal to deny them the full connectivity and the full benefits that connection enables." [16:45] <yipdw> evidently Wheeler has never used Comcast services, only lobbied for them [17:10] <exmic> hahaha [18:02] <schbirid> lol wtf, startpage.com's "community" link goes tot heir facebook site [18:07] <midas> so this is about archive.org and a little bit about archiveteam also ;) http://www.nu.nl/weekend/3769630/bibliothecaris-van-internet-wil-websites-niet-verloren-laten-gaan.html [18:07] <midas> pritty big website in .nl [20:37] <godane> i'm grabbing Bomb Patrol Afghanistan [20:37] <godane> cause it aired on G4 [20:37] <godane> i'm getting the 720p copies [20:43] <exmic> ./wg 25 [22:17] <godane> so i'm up to 7600 items now in my godaneinbox [22:53] <godane> so i'm trying to download mov file from way back machine and i can't get it the full file [22:53] <godane> i'm trying this one at the moment: https://web.archive.org/web/20060602014144/http://www.commandn.tv/cN/044/commandN-044-h264.mov [22:54] <godane> the way back machine url will just stop about 32.7mb into the file [22:54] <godane> even with wget [23:21] <godane> some good news [23:21] <godane> looks like i maybe able to get some mp4 files of commandN thur veoh.com [23:25] <godane> and also here is the sitemap of veoh.com: http://www.veoh.com/sitemap.xml [23:30] <godane> so based on one of the veho.com html files you can get the full path of videos from them [23:40] <godane> here is some example code for grabbing the video files from veoh.com: [23:40] <godane> curl http://www.veoh.com/watch/e20095 | grep fullPreviewHashHighPath | sed 's|.*fullPreviewHashHighPath":"||g' | sed 's|",".*||g' | sed 's|.*/content|content|g' [23:41] <godane> you may want to add a -O h$id.mp4 or some thing or you get file names like this below: [23:41] <godane> h20095.mp4?ct=ebfae74e540fcd7e297a588892beca41022dc1cd2d5c355d [23:53] <godane> looks like there is no page on archiveteam.org for veoh.com