#archiveteam-bs 2018-02-01,Thu

↑back Search

Time Nickname Message
00:07 🔗 REiN^ has quit IRC (Read error: Operation timed out)
00:13 🔗 REiN^ has joined #archiveteam-bs
00:30 🔗 JAA Total size from those is 85.6 GiB, by the way.
00:47 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
01:01 🔗 godane SketchCow: i'm now down to 3 tapes i need to digitize from your box
01:02 🔗 godane this one tape is going to have a file called random-mtv-the-prisoner-incomplete-fall-1991 or something
01:02 🔗 godane cause it goes very random with no complete blocks
02:24 🔗 Soni has quit IRC (Ping timeout: 264 seconds)
02:28 🔗 Atom-- has joined #archiveteam-bs
02:33 🔗 Soni has joined #archiveteam-bs
03:37 🔗 MrDignity has quit IRC (Read error: Operation timed out)
03:37 🔗 MrDignity has joined #archiveteam-bs
03:53 🔗 Valentine has quit IRC (Read error: Operation timed out)
03:58 🔗 Valentine has joined #archiveteam-bs
04:18 🔗 BlueMaxim has quit IRC (Leaving)
04:44 🔗 qw3rty119 has joined #archiveteam-bs
04:48 🔗 qw3rty118 has quit IRC (Read error: Operation timed out)
05:33 🔗 BlueMaxim has joined #archiveteam-bs
05:38 🔗 hook54321 JAA: ok, I'll ask some people in a Catalan discord server if there's any reason to keep grabbing them, I haven't really been following up on what's going on with Catalonia and Spain, lol.
05:41 🔗 Jonimus has quit IRC (Read error: Operation timed out)
05:42 🔗 dashcloud has quit IRC (No Ping reply in 180 seconds.)
05:43 🔗 Jonimus has joined #archiveteam-bs
05:43 🔗 swebb sets mode: +o Jonimus
05:44 🔗 hook54321 !ig 10sb194o1dqht3i8nmqxeeg6g ^https?://www\.oswegofirst\.org/
05:44 🔗 dashcloud has joined #archiveteam-bs
05:45 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
05:46 🔗 BlueMaxim has joined #archiveteam-bs
05:51 🔗 godane SketchCow: so i done 37,055 items this month
06:31 🔗 SketchCow YES
06:52 🔗 godane i have very slow with my uploads over the past year
06:53 🔗 godane i'm trying to get tons dtic pdfs uploaded this year
06:53 🔗 godane so i can be done
08:31 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
08:32 🔗 Mateon1 has joined #archiveteam-bs
08:34 🔗 godane SketchCow: btw the box you mailed me had audio tapes for some reason
09:07 🔗 REiN^ has quit IRC (no.money.no.love)
09:08 🔗 REiN^ has joined #archiveteam-bs
09:11 🔗 BlueMaxim has quit IRC (Ping timeout: 252 seconds)
09:12 🔗 BlueMaxim has joined #archiveteam-bs
09:36 🔗 REiN^ has quit IRC (no.money.no.love)
09:50 🔗 REiN^ has joined #archiveteam-bs
09:57 🔗 schbirid has joined #archiveteam-bs
11:22 🔗 schbirid should we make some strava map tile scraping a warrior project? i have no idea how to do that but the tiles are super easy to get and they throw proper errors if you get banned (seems to be based on volume or patterns, not sure)
11:23 🔗 schbirid up to zoomlevel 11 (including) was easy to do on a single connection
11:23 🔗 schbirid but 12 is just too huge
11:27 🔗 schbirid there are (2**12)**2 -> 16777216 tiles at that level (total filesize would be around 60G uncompressed)
11:27 🔗 schbirid getting 4/5 per second has not banned me so far but eh
11:34 🔗 ranav has quit IRC (Read error: Operation timed out)
11:55 🔗 ranavalon has joined #archiveteam-bs
12:03 🔗 pizzaiolo has joined #archiveteam-bs
12:04 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
12:28 🔗 Kaz Wouldn't be too hard to do I guess, if you want to do it
13:52 🔗 SilSte has joined #archiveteam-bs
14:04 🔗 Smiley schbirid: sounds possible, take to #warrior and ask if anyone can pick it up?
14:04 🔗 Smiley I'd try, but I have no clue xD
14:04 🔗 Smiley or if you have lists, just get a few people to pick up a range of tiles each and you should get it done pretty quick
14:29 🔗 antomatic has quit IRC (Read error: Connection reset by peer)
14:30 🔗 antomatic has joined #archiveteam-bs
14:30 🔗 swebb sets mode: +o antomatic
15:08 🔗 zhongfu has joined #archiveteam-bs
15:25 🔗 schbirid http://wilmunder.com/Arics_World/Games.html
15:33 🔗 zhongfu has quit IRC (Remote host closed the connection)
15:40 🔗 zhongfu has joined #archiveteam-bs
15:41 🔗 MrDignity has quit IRC (Remote host closed the connection)
15:50 🔗 zhongfu has quit IRC (Remote host closed the connection)
15:52 🔗 zhongfu has joined #archiveteam-bs
15:56 🔗 zhongfu has quit IRC (Remote host closed the connection)
15:56 🔗 zhongfu has joined #archiveteam-bs
16:01 🔗 godane SketchCow: i'm digitizing the last tape from that small box
16:16 🔗 SketchCow Thanks. So we should arrange the transfer back along with a note in the box saying "done", and then Mank can send you the next batch.
16:17 🔗 zhongfu has quit IRC (Remote host closed the connection)
16:52 🔗 zhongfu has joined #archiveteam-bs
17:04 🔗 pizzaiolo has quit IRC (Read error: Operation timed out)
17:08 🔗 pizzaiolo has joined #archiveteam-bs
17:11 🔗 pizzaiolo has quit IRC (Client Quit)
17:12 🔗 pizzaiolo has joined #archiveteam-bs
17:15 🔗 Jusque_ has joined #archiveteam-bs
17:16 🔗 Jusque has quit IRC (Read error: Operation timed out)
17:16 🔗 Jusque_ is now known as Jusque
17:49 🔗 MrDignity has joined #archiveteam-bs
17:49 🔗 MrDignity has quit IRC (Remote host closed the connection)
17:51 🔗 MrDignity has joined #archiveteam-bs
17:51 🔗 MrDignity has quit IRC (Read error: Connection reset by peer)
17:52 🔗 MrDignity has joined #archiveteam-bs
18:22 🔗 schbirid if you want to help me archive the strava heatmap tiles partially: https://justpaste.it/1gir1
18:22 🔗 schbirid single long-running wgets
18:23 🔗 schbirid i am doing aa and ab
18:29 🔗 JAA schbirid: How long does one list take approximately?
18:34 🔗 SketchCow That strava data is super doomed, so yeah
18:36 🔗 schbirid JAA: i get ~4 tiles per second so ~3 days
18:37 🔗 schbirid i am also doing ac
18:38 🔗 schbirid oh and ad. i have more servers than i remembered =)
18:43 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
18:46 🔗 pizzaiolo has joined #archiveteam-bs
18:48 🔗 pizzaiolo has quit IRC (Client Quit)
18:50 🔗 pizzaiolo has joined #archiveteam-bs
18:52 🔗 arbin has joined #archiveteam-bs
18:53 🔗 dino has joined #archiveteam-bs
18:53 🔗 schbirid :)
18:53 🔗 * astrid waves
18:53 🔗 dino Hi
18:53 🔗 schbirid bs for everyone
18:54 🔗 JAA schbirid, arbin: Interesting, I get the same error with wget, but it works with curl.
18:54 🔗 schbirid hm, just do --no-check-certificate then.
18:54 🔗 arbin maybe wget has a baked in crappy version of openssl
18:54 🔗 arbin JAA: what was ur full cmd
18:54 🔗 JAA I wouldn't be surprised.
18:54 🔗 JAA curl URL > FILE
18:54 🔗 JAA Nothing special
18:55 🔗 astrid dino: do you have an example url to one of these paranormios?
18:55 🔗 arbin i think schbirid intends the leecher to unpack the gz, and iterate over the list of links in it
18:55 🔗 astrid oops, time for a meeting
18:55 🔗 arbin so just doing curl URL prob wont work
18:55 🔗 dino Maybe to say it again: I'm a photographer and have no idea how the archive process looks like. You're the pros here. I'm just wondering if it possible to create a browsable version with the already saved images
18:55 🔗 JAA arbin: You need to download the list, gunzip it, and then run the wget command which reads that URL list.
18:56 🔗 pizzaiolo has quit IRC (pizzaiolo)
18:56 🔗 arbin schbirid: u intend to download the gz list through a normal browser?
18:56 🔗 schbirid i dont care
18:56 🔗 schbirid those are just url lists
18:56 🔗 schbirid you unpack them, then point wget to them
18:58 🔗 dino ehm guys I don't know what your talking about ;) Keeping the question simple: Is there a possiblity to use/actually view and organize the archived files here:
18:58 🔗 dino https://archive.org/details/archiveteam_panoramio?&sort=-downloads&page=2
18:58 🔗 JAA There's another discussion going on about something completely unrelated.
18:58 🔗 arbin schbirid:
18:58 🔗 JAA A browsable archive of Panoramio would be lovely. I have no idea how that site worked or what the archives look like though.
18:58 🔗 arbin E:\schbird>wget -nc -nv --no-host-directories -x -i strava_12_shuffled_ae
18:58 🔗 arbin wget: memory exhausted
18:59 🔗 schbirid arbin: sorry, then this is not a project for you :)
18:59 🔗 arbin interesting that's an issue with 16GB RAM
18:59 🔗 pizzaiolo has joined #archiveteam-bs
18:59 🔗 dino I mean I love to see the images from Panoramio archived but they aren't really useful when you can't actually look at them
18:59 🔗 schbirid no issues with anything on a 768MB vps myself
18:59 🔗 schbirid bbl
19:00 🔗 arbin schbirid: why do you have to use wget? plenty of other ways to leech that much
19:00 🔗 JAA arbin: I don't think he cares. He just wants the data somehow, and wget is a simple way of achieving that.
19:01 🔗 JAA Uh, why are many of those Panoramio WARCs empty??
19:02 🔗 JAA Examples: https://archive.org/download/archiveteam_panoramio_20161105184626.upload https://archive.org/download/archiveteam_panoramio_20161106063218.upload https://archive.org/download/archiveteam_panoramio_20161027123253.upload https://archive.org/download/archiveteam_panoramio_20161105012626.upload
19:02 🔗 JAA None of these contain any data.
19:03 🔗 JAA Ah, further down the page.
19:03 🔗 dino Guys I'm not into the archiving process in any way. (and I don't understand half of what you're talking about) I just would love to know what's the idea of this "Panoramio Archive", If nobody can actually see the images? https://archive.org/details/archiveteam_panoramio?&sort=-downloads&page=2
19:03 🔗 JAA https://archive.org/download/archiveteam_panoramio_20170220050222 is the last file in the collection with data in it, it seems.
19:05 🔗 dino has quit IRC (Quit: Page closed)
19:05 🔗 JAA dino: You can look at the images, it's just really annoying to do so. For example, https://web.archive.org/web/20161013025252im_/http://mw2.google.com/mw-panoramio/photos/medium/11314446.jpg
19:05 🔗 JAA Welp
19:06 🔗 dino has joined #archiveteam-bs
19:06 🔗 JAA dino: You can look at the images, it's just really annoying to do so. For example, https://web.archive.org/web/20161013025252im_/http://mw2.google.com/mw-panoramio/photos/medium/11314446.jpg
19:06 🔗 dino ah okay
19:07 🔗 JAA Also stuff like this: https://web.archive.org/web/20161027205516/http://www.panoramio.com/userfeed/1412955
19:07 🔗 JAA I'm just discovering this through the files in the collection. I have no idea what the strategy was when it was grabbed (I wasn't here yet when that happened).
19:08 🔗 dino ah alright - thanks - so I need to try to connect to the uploader.
19:09 🔗 dino Maybe there is a way to make the library browsable again
19:09 🔗 JAA It appears that photo pages and user profiles were grabbed.
19:09 🔗 JAA https://web.archive.org/web/20161028204107/http://www.panoramio.com/photo/106502705
19:09 🔗 JAA https://web.archive.org/web/20161028204109/http://www.panoramio.com/user/1412955?with_photo_id=106502705
19:10 🔗 JAA But the content isn't easily discoverable because there is no search interface etc.
19:10 🔗 JAA Which is frequently the case, because that isn't really archiveable.
19:11 🔗 dino Thanks for the info! So do you think there's no way to get the images back on a map again?
19:12 🔗 JAA Not without a lot of effort.
19:13 🔗 JAA Basically, you'd have to download the archives, extract the coordinates for each image, and build a database of the (image URL, coordinates) pairs.
19:14 🔗 dino I guess the location data is in the .exif data of the images
19:14 🔗 dino let me try to check that
19:15 🔗 arbin schbirid: ok, i'll grab ae
19:15 🔗 JAA Doesn't look like it, at least not always.
19:15 🔗 dino has quit IRC (Quit: Page closed)
19:17 🔗 dino has joined #archiveteam-bs
19:17 🔗 JAA The raw image of https://web.archive.org/web/20161029235055/http://www.panoramio.com/photo/91390820 is https://web.archive.org/web/20161029235228im_/http://static.panoramio.com/photos/original/91390820.jpg and contains a ton of EXIF data, but no coordinates as far as I can see.
19:17 🔗 JAA So it'd have to be extracted from the web page, I guess.
19:17 🔗 pizzaiolo has quit IRC (pizzaiolo)
19:18 🔗 dino yea I can't see any either
19:19 🔗 JAA Makes sense actually, since it looks like the location of a photo could be changed after upload at Panoramio ("Misplaced? Suggest new location").
19:19 🔗 dino oh you're right
19:20 🔗 dino crap, I'll contact Jason Scot (the creator of this archive) and ask him if he sees any option to show the images on a map
19:20 🔗 JAA Jason is here.
19:20 🔗 JAA SketchCow: ^
19:20 🔗 JAA I kind of doubt he knows more about it though.
19:20 🔗 pizzaiolo has joined #archiveteam-bs
19:21 🔗 JAA Note also that at least in that example, the EXIF data was overwritten when the image was edited. It doesn't contain any information about the camera, exposure, etc., but the web page does mention those details (in the lower right corner).
19:22 🔗 dino yea the data is definitely on the website somewhere and not in the .exif of the .fpg
19:22 🔗 JAA Yep
19:27 🔗 schbirid arbin: thx!
19:32 🔗 arbin you say these are approx 5gb uncrompressed?
19:36 🔗 schbirid the set of tiles per url list will be about that
19:37 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
19:39 🔗 arbin neato, i can download 2 sets at once
19:39 🔗 arbin stupidly shared university community wifi ftw
19:39 🔗 arbin schbirid: grabbing aq as well
19:39 🔗 schbirid cheers!
19:40 🔗 pizzaiolo has joined #archiveteam-bs
19:45 🔗 dino has quit IRC (Quit: Page closed)
20:01 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
20:03 🔗 pizzaiolo has joined #archiveteam-bs
20:54 🔗 MrDignity has quit IRC (Remote host closed the connection)
20:55 🔗 MrDignity has joined #archiveteam-bs
21:06 🔗 pizzaiolo has quit IRC (Ping timeout: 247 seconds)
21:14 🔗 pizzaiolo has joined #archiveteam-bs
21:14 🔗 jschwart has joined #archiveteam-bs
21:15 🔗 SketchCow For the people who cared about this, there's a show tonight about that 1-tb drive offline internet in Cuba, with some of the artists who help maintain it, in NYC, and I'll be there.
21:16 🔗 WubTheCap I was watching that 34C3 talk right now
21:16 🔗 SketchCow JAA is quite right. I don't know about it. The project needs a caring, engaged person to go through the reams of data, derive value, and determine how to re-present it as best as possible.
21:16 🔗 WubTheCap a 34C3 talk about that*
21:16 🔗 SketchCow WubTheCap: <whispers> That talk is not incredibly great.
21:17 🔗 SketchCow It does have a few drabs of trivia about Euro BBSes I wasn't aware of, of course.
21:29 🔗 octothorp has quit IRC (Remote host closed the connection)
21:35 🔗 Dimtree has quit IRC (Read error: Operation timed out)
21:43 🔗 octothorp has joined #archiveteam-bs
21:46 🔗 Dimtree has joined #archiveteam-bs
21:50 🔗 pizzaiolo has quit IRC (pizzaiolo)
21:51 🔗 godane SketchCow: last of yours tape is digitized now
22:04 🔗 schbirid has quit IRC (Quit: Leaving)
22:17 🔗 Dimtree has quit IRC (Read error: Connection reset by peer)
22:51 🔗 jschwart has quit IRC (Quit: Konversation terminated!)
22:57 🔗 Dimtree has joined #archiveteam-bs
23:06 🔗 godane SketchCow: so i got about 95gb of your tapes to upload
23:07 🔗 godane i also got another 30gb from the set of boxes that was odd parts of tapes
23:27 🔗 Dimtree has quit IRC (Read error: Operation timed out)
23:27 🔗 MrDignity has quit IRC (Read error: Connection reset by peer)
23:33 🔗 Dimtree has joined #archiveteam-bs
23:35 🔗 Dimtree has quit IRC (Client Quit)
23:49 🔗 Dimtree has joined #archiveteam-bs

irclogger-viewer