#archiveteam-bs 2014-05-15,Thu

↑back Search

Time Nickname Message
09:04 🔗 midas so, was that a small internet earthquake?
10:54 🔗 schbirid have some html injection https://startpage.com/do/search?query=Reginald+D.+Hunter&cat=web&pl=chrome&language=english
13:30 🔗 midas can someone point me in the right direction? i have a list of usernames for home.xmsnet.nl now i just need to figure out how i can get wget to work with this to warc it.
13:34 🔗 midas right, bad snort.
13:35 🔗 midas can someone point me in the right direction? i have a list of usernames for home.xmsnet.nl now i just need to figure out how i can get wget to work with this to warc it.
13:49 🔗 SadDM midas: is there any corelation between the usernames and urls?
13:51 🔗 rocode_ home.xmsnet.nl/username IIRC
13:51 🔗 SadDM hmm, it looks like the urls are http://home.xmsnet.nl/username
13:51 🔗 SadDM right
13:52 🔗 rocode Which version of wget was warc enabled?
13:52 🔗 SadDM so th easiest thing to do is to make a text file with a list of those urls and then do a wget -i <filename>
13:53 🔗 SadDM 1.14 maybe?
13:54 🔗 SadDM yup... >1.14 has warc baked in
13:56 🔗 SadDM midas: with what I've suggested you'll end up with a warc for every line of the text file... if that's not ultimately desireable, you can megawarc them all together after the fact.
13:58 🔗 rocode There is only 156 sites. Not too bad.
14:27 🔗 midas SadDM: sorry, was out of irc for a sec
14:27 🔗 midas yeah, -i would work with --warc-file ?
14:28 🔗 SadDM it sure does
14:28 🔗 SadDM you just end up with one for every url in the file
14:28 🔗 midas ok, lets start a screen :p
14:28 🔗 midas thats no issue
14:32 🔗 midas wget wants a argument for warc-file
14:33 🔗 midas (i feel stupid right now :p)
14:35 🔗 SadDM oh... uh
14:35 🔗 SadDM hmm... maybe I got ahead of myself
14:35 🔗 ohhdemgir I keep cocking up my media types, wish users could move items :|
14:35 🔗 midas my guess was just wget -i <file> -m --warc-file but thats no balony
14:35 🔗 SadDM I know that I've done something like this
14:36 🔗 SadDM maybe I just built a bunch of wget commands in a file and then piped it through a shell
14:37 🔗 SadDM sorry midas, I think I've given you slightly bad advice
14:37 🔗 SadDM are you on a unix machine?
14:40 🔗 midas yeah
14:40 🔗 SadDM something like this should generate the wgets: cat usernames.txt|sed 's/\(.*\)/wget -m --warc-file=\1 http:\/\/home.xmsnet.nl/\1//' > wgets
14:41 🔗 SadDM note that I just wrote that cold and didn't test it or anything
14:41 🔗 SadDM then just "sh wgets"
14:41 🔗 midas lol. thanks! i think that would work :)
14:42 🔗 SadDM :-D it's close-ish at least
14:42 🔗 midas indeed, and thats more than i could do on short notice :p
14:48 🔗 godane so i fixed some of the uploads of sydney morning herald and Australian women's weekly
14:48 🔗 godane some of the uploads were incomplete downloads
14:50 🔗 Smiley midas: around bud?
14:50 🔗 godane so i'm mirroring more glenn beck episodes
14:51 🔗 midas Smiley: yeah
14:51 🔗 midas im always somewhere
14:51 🔗 Smiley midas: how are/were you testing cartoon hd on android,
14:51 🔗 Smiley you running a avd on your system?
14:52 🔗 midas didnt have time since last week, work said i had to do something :<
14:53 🔗 midas i think ill debug it tonight again
14:53 🔗 midas just have to get this grab started
14:54 🔗 Smiley yeah I'm just trying to setup an AVD so I can just grab stuff rather than grabbing on my phone and pushing accross the network
14:54 🔗 Smiley ah got it working now :D
14:55 🔗 * Smiley waits to see if it loads
14:57 🔗 midas ah bloody cool Smiley !
14:58 🔗 Smiley midas: hmmmmm just sitting on "android" screen atm :/
15:24 🔗 Smiley midas: just run android avd
15:24 🔗 Smiley and create a new avd using a nice h/w
15:26 🔗 Smiley or at least that's what I'm trying to do.
15:26 🔗 Smiley it's quite slow.
15:27 🔗 Smiley as I just told it to create a 5Gb sdcard D:
15:37 🔗 midas lol
15:37 🔗 * midas throws stones at midas
15:46 🔗 Smiley grrr
15:46 🔗 Smiley none will boot D:
15:51 🔗 Smiley yey got one booting \o/
15:59 🔗 Smiley and using android monitor, it has a file explorer with push/pull
15:59 🔗 Smiley or i can use adb. sweeeet
16:45 🔗 yipdw lol Wheeler
16:45 🔗 yipdw "Simply put, when a consumer buys a specified bandwidth, it is commercially unreasonable and thus a violation of this proposal to deny them the full connectivity and the full benefits that connection enables."
16:45 🔗 yipdw evidently Wheeler has never used Comcast services, only lobbied for them
17:10 🔗 exmic hahaha
18:02 🔗 schbirid lol wtf, startpage.com's "community" link goes tot heir facebook site
18:07 🔗 midas so this is about archive.org and a little bit about archiveteam also ;) http://www.nu.nl/weekend/3769630/bibliothecaris-van-internet-wil-websites-niet-verloren-laten-gaan.html
18:07 🔗 midas pritty big website in .nl
20:37 🔗 godane i'm grabbing Bomb Patrol Afghanistan
20:37 🔗 godane cause it aired on G4
20:37 🔗 godane i'm getting the 720p copies
20:43 🔗 exmic ./wg 25
22:17 🔗 godane so i'm up to 7600 items now in my godaneinbox
22:53 🔗 godane so i'm trying to download mov file from way back machine and i can't get it the full file
22:53 🔗 godane i'm trying this one at the moment: https://web.archive.org/web/20060602014144/http://www.commandn.tv/cN/044/commandN-044-h264.mov
22:54 🔗 godane the way back machine url will just stop about 32.7mb into the file
22:54 🔗 godane even with wget
23:21 🔗 godane some good news
23:21 🔗 godane looks like i maybe able to get some mp4 files of commandN thur veoh.com
23:25 🔗 godane and also here is the sitemap of veoh.com: http://www.veoh.com/sitemap.xml
23:30 🔗 godane so based on one of the veho.com html files you can get the full path of videos from them
23:40 🔗 godane here is some example code for grabbing the video files from veoh.com:
23:40 🔗 godane curl http://www.veoh.com/watch/e20095 | grep fullPreviewHashHighPath | sed 's|.*fullPreviewHashHighPath":"||g' | sed 's|",".*||g' | sed 's|.*/content|content|g'
23:41 🔗 godane you may want to add a -O h$id.mp4 or some thing or you get file names like this below:
23:41 🔗 godane h20095.mp4?ct=ebfae74e540fcd7e297a588892beca41022dc1cd2d5c355d
23:53 🔗 godane looks like there is no page on archiveteam.org for veoh.com
