#archiveteam-bs 2014-05-15,Thu

↑back Search

Time	Nickname	Message
09:04 ^🔗	midas	so, was that a small internet earthquake?
10:54 ^🔗	schbirid	have some html injection https://startpage.com/do/search?query=Reginald+D.+Hunter&cat=web&pl=chrome&language=english
13:30 ^🔗	midas	can someone point me in the right direction? i have a list of usernames for home.xmsnet.nl now i just need to figure out how i can get wget to work with this to warc it.
13:34 ^🔗	midas	right, bad snort.
13:35 ^🔗	midas	can someone point me in the right direction? i have a list of usernames for home.xmsnet.nl now i just need to figure out how i can get wget to work with this to warc it.
13:49 ^🔗	SadDM	midas: is there any corelation between the usernames and urls?
13:51 ^🔗	rocode_	home.xmsnet.nl/username IIRC
13:51 ^🔗	SadDM	hmm, it looks like the urls are http://home.xmsnet.nl/username
13:51 ^🔗	SadDM	right
13:52 ^🔗	rocode	Which version of wget was warc enabled?
13:52 ^🔗	SadDM	so th easiest thing to do is to make a text file with a list of those urls and then do a wget -i <filename>
13:53 ^🔗	SadDM	1.14 maybe?
13:54 ^🔗	SadDM	yup... >1.14 has warc baked in
13:56 ^🔗	SadDM	midas: with what I've suggested you'll end up with a warc for every line of the text file... if that's not ultimately desireable, you can megawarc them all together after the fact.
13:58 ^🔗	rocode	There is only 156 sites. Not too bad.
14:27 ^🔗	midas	SadDM: sorry, was out of irc for a sec
14:27 ^🔗	midas	yeah, -i would work with --warc-file ?
14:28 ^🔗	SadDM	it sure does
14:28 ^🔗	SadDM	you just end up with one for every url in the file
14:28 ^🔗	midas	ok, lets start a screen :p
14:28 ^🔗	midas	thats no issue
14:32 ^🔗	midas	wget wants a argument for warc-file
14:33 ^🔗	midas	(i feel stupid right now :p)
14:35 ^🔗	SadDM	oh... uh
14:35 ^🔗	SadDM	hmm... maybe I got ahead of myself
14:35 ^🔗	ohhdemgir	I keep cocking up my media types, wish users could move items :\|
14:35 ^🔗	midas	my guess was just wget -i <file> -m --warc-file but thats no balony
14:35 ^🔗	SadDM	I know that I've done something like this
14:36 ^🔗	SadDM	maybe I just built a bunch of wget commands in a file and then piped it through a shell
14:37 ^🔗	SadDM	sorry midas, I think I've given you slightly bad advice
14:37 ^🔗	SadDM	are you on a unix machine?
14:40 ^🔗	midas	yeah
14:40 ^🔗	SadDM	something like this should generate the wgets: cat usernames.txt\|sed 's/$.*$/wget -m --warc-file=\1 http:\/\/home.xmsnet.nl/\1//' > wgets
14:41 ^🔗	SadDM	note that I just wrote that cold and didn't test it or anything
14:41 ^🔗	SadDM	then just "sh wgets"
14:41 ^🔗	midas	lol. thanks! i think that would work :)
14:42 ^🔗	SadDM	:-D it's close-ish at least
14:42 ^🔗	midas	indeed, and thats more than i could do on short notice :p
14:48 ^🔗	godane	so i fixed some of the uploads of sydney morning herald and Australian women's weekly
14:48 ^🔗	godane	some of the uploads were incomplete downloads
14:50 ^🔗	Smiley	midas: around bud?
14:50 ^🔗	godane	so i'm mirroring more glenn beck episodes
14:51 ^🔗	midas	Smiley: yeah
14:51 ^🔗	midas	im always somewhere
14:51 ^🔗	Smiley	midas: how are/were you testing cartoon hd on android,
14:51 ^🔗	Smiley	you running a avd on your system?
14:52 ^🔗	midas	didnt have time since last week, work said i had to do something :<
14:53 ^🔗	midas	i think ill debug it tonight again
14:53 ^🔗	midas	just have to get this grab started
14:54 ^🔗	Smiley	yeah I'm just trying to setup an AVD so I can just grab stuff rather than grabbing on my phone and pushing accross the network
14:54 ^🔗	Smiley	ah got it working now :D
14:55 ^🔗	*	Smiley waits to see if it loads
14:57 ^🔗	midas	ah bloody cool Smiley !
14:58 ^🔗	Smiley	midas: hmmmmm just sitting on "android" screen atm :/
15:24 ^🔗	Smiley	midas: just run android avd
15:24 ^🔗	Smiley	and create a new avd using a nice h/w
15:26 ^🔗	Smiley	or at least that's what I'm trying to do.
15:26 ^🔗	Smiley	it's quite slow.
15:27 ^🔗	Smiley	as I just told it to create a 5Gb sdcard D:
15:37 ^🔗	midas	lol
15:37 ^🔗	*	midas throws stones at midas
15:46 ^🔗	Smiley	grrr
15:46 ^🔗	Smiley	none will boot D:
15:51 ^🔗	Smiley	yey got one booting \o/
15:59 ^🔗	Smiley	and using android monitor, it has a file explorer with push/pull
15:59 ^🔗	Smiley	or i can use adb. sweeeet
16:45 ^🔗	yipdw	lol Wheeler
16:45 ^🔗	yipdw	"Simply put, when a consumer buys a specified bandwidth, it is commercially unreasonable and thus a violation of this proposal to deny them the full connectivity and the full benefits that connection enables."
16:45 ^🔗	yipdw	evidently Wheeler has never used Comcast services, only lobbied for them
17:10 ^🔗	exmic	hahaha
18:02 ^🔗	schbirid	lol wtf, startpage.com's "community" link goes tot heir facebook site
18:07 ^🔗	midas	so this is about archive.org and a little bit about archiveteam also ;) http://www.nu.nl/weekend/3769630/bibliothecaris-van-internet-wil-websites-niet-verloren-laten-gaan.html
18:07 ^🔗	midas	pritty big website in .nl
20:37 ^🔗	godane	i'm grabbing Bomb Patrol Afghanistan
20:37 ^🔗	godane	cause it aired on G4
20:37 ^🔗	godane	i'm getting the 720p copies
20:43 ^🔗	exmic	./wg 25
22:17 ^🔗	godane	so i'm up to 7600 items now in my godaneinbox
22:53 ^🔗	godane	so i'm trying to download mov file from way back machine and i can't get it the full file
22:53 ^🔗	godane	i'm trying this one at the moment: https://web.archive.org/web/20060602014144/http://www.commandn.tv/cN/044/commandN-044-h264.mov
22:54 ^🔗	godane	the way back machine url will just stop about 32.7mb into the file
22:54 ^🔗	godane	even with wget
23:21 ^🔗	godane	some good news
23:21 ^🔗	godane	looks like i maybe able to get some mp4 files of commandN thur veoh.com
23:25 ^🔗	godane	and also here is the sitemap of veoh.com: http://www.veoh.com/sitemap.xml
23:30 ^🔗	godane	so based on one of the veho.com html files you can get the full path of videos from them
23:40 ^🔗	godane	here is some example code for grabbing the video files from veoh.com:
23:40 ^🔗	godane	curl http://www.veoh.com/watch/e20095 \| grep fullPreviewHashHighPath \| sed 's\|.fullPreviewHashHighPath":"\|\|g' \| sed 's\|",".\|\|g' \| sed 's\|.*/content\|content\|g'
23:41 ^🔗	godane	you may want to add a -O h$id.mp4 or some thing or you get file names like this below:
23:41 ^🔗	godane	h20095.mp4?ct=ebfae74e540fcd7e297a588892beca41022dc1cd2d5c355d
23:53 ^🔗	godane	looks like there is no page on archiveteam.org for veoh.com

irclogger-viewer