Time |
Nickname |
Message |
00:42
🔗
|
tfgbd |
Is there any kind of archive of product packages out there? |
00:43
🔗
|
tfgbd |
Lately (past year), I've been photographing every food/other package I could find |
00:47
🔗
|
joepie91 |
tfgbd: mm |
00:47
🔗
|
joepie91 |
this may be useful for a planned future project of mine |
00:47
🔗
|
joepie91 |
:) |
01:02
🔗
|
DFJustin |
wikimedia commons would probably take some of them if they're good quality |
01:08
🔗
|
tfgbd |
All angles? |
01:08
🔗
|
tfgbd |
I have all sides of the box |
01:08
🔗
|
tfgbd |
and the date can be gathered from the image metadata |
01:11
🔗
|
tfgbd |
Do you guys or archive.org ever work with sites like: http://www.oldversion.com/ or http://www.oldapps.com/ |
01:13
🔗
|
DFJustin |
work with, no |
01:13
🔗
|
DFJustin |
oldapps was partially crawled with archivebot |
01:14
🔗
|
tfgbd |
might be easier if they just gave you access to backups |
01:14
🔗
|
tfgbd |
why only partially? |
01:14
🔗
|
DFJustin |
job crashed |
01:14
🔗
|
Diesel_ |
Great idea, you're in charge |
01:14
🔗
|
DFJustin |
actually archive.org has gotten sets of data from some software sites in the past |
01:14
🔗
|
DFJustin |
tucows circa 2004 https://archive.org/details/tucows |
01:15
🔗
|
tfgbd |
yeah, I knew about tucows |
01:15
🔗
|
DFJustin |
old browser versions from evolt.org https://archive.org/details/evolt_browser_archive |
01:15
🔗
|
tfgbd |
wait, is tucows gone? |
01:15
🔗
|
DFJustin |
no it's still around last I checked |
01:15
🔗
|
DFJustin |
but the archive wasn't kept up to date |
01:15
🔗
|
tfgbd |
They just removed old stuff |
01:16
🔗
|
tfgbd |
I know there used to be lots of tucows mirrors around years back |
01:16
🔗
|
tfgbd |
they're like one of the few that had tons of mirrors |
01:16
🔗
|
DFJustin |
there's an archive team project to crawl all public ftps https://archive.org/details/ftpsites |
01:17
🔗
|
tfgbd |
do those use WARC too? |
01:17
🔗
|
DFJustin |
no |
01:17
🔗
|
DFJustin |
just tar or zip |
01:17
🔗
|
tfgbd |
where do you put them then? |
01:18
🔗
|
DFJustin |
archive.org file items |
01:18
🔗
|
tfgbd |
that sucks |
01:18
🔗
|
DFJustin |
wayback machine doesn't do ftp |
01:18
🔗
|
tfgbd |
you have to download the whole FTP? |
01:18
🔗
|
tfgbd |
well, at least they're there |
01:18
🔗
|
DFJustin |
depends on the site, some of them are too big and have to be split into subdirectories |
01:19
🔗
|
tfgbd |
maybe they will be able to be mirrored somewhere if they ever start a project for ftp |
01:19
🔗
|
DFJustin |
also archive.org lets you browse inside archive files |
01:19
🔗
|
tfgbd |
ahh |
01:20
🔗
|
DFJustin |
look for the "[contents]" link or add a / to the download link |
02:03
🔗
|
espes__ |
tfgbd: joepie91: DFJustin: if people are seriously interested in jdget then I'm interested in help getting it into a stage where it's maintainable and useful |
02:20
🔗
|
tfgbd |
cool |
02:20
🔗
|
tfgbd |
how does it deal with captchas, though |
02:31
🔗
|
espes__ |
it doesn't |
02:32
🔗
|
espes__ |
apart from jdownloaders captcha solver |
02:32
🔗
|
espes__ |
which I might even have disabled |
04:02
🔗
|
APerti |
http://www.ephotobay.com/image/shadowrun-snes-300.jpg |
04:35
🔗
|
SketchCow |
Boop |
04:35
🔗
|
xmc |
beep |
04:56
🔗
|
APerti |
Working on a Lemmings SNES scan for Psygnosis.org, Jason. |
05:19
🔗
|
SketchCow |
-bs, please. :) |
13:01
🔗
|
joepie91 |
espes__: does it actually run as native code? |
13:01
🔗
|
joepie91 |
or does it still use the JRE |
13:57
🔗
|
espes__ |
joepie91: it's all compiled to native code with GCJ |
13:58
🔗
|
espes__ |
you get one fat 80MB binary |
14:56
🔗
|
Jonimus |
You know what we need, a Waterboy parody video that replaces "Water Sucks" with "Yahoo Sucks" and "Gatorade is better" with "ArchiveTeam is better" |
14:56
🔗
|
* |
Jonimus debates working on that tonight. |
15:09
🔗
|
joepie91 |
Jonimus: we getting a theme song now? :D |
15:15
🔗
|
qwerty0 |
hey, does anyone know what happened to the google video archive? |
15:16
🔗
|
qwerty0 |
all I can find on archive.org is one item that looks like a captured one by them, not us: https://archive.org/details/GVID-20110417095014-crawl340 |
15:21
🔗
|
Jonimus |
joepie91: well my GF is a pretty good singer, if I can get something together we just might. |
15:22
🔗
|
joepie91 |
:D |
15:22
🔗
|
joepie91 |
SketchCow: see above question from qwerty0 |
15:22
🔗
|
SketchCow |
Good question. |
15:48
🔗
|
Arkiver2 |
SketchCow: I saw this contributor uploading old magazines: https://archive.org/search.php?query=uploader%3A%22paulo%40paulogarcia.com%22 |
15:48
🔗
|
Arkiver2 |
Maybe something for the magazine archive? |
15:51
🔗
|
SketchCow |
Sadly, they were already in there. |
15:51
🔗
|
SketchCow |
All of them, just checked. The exactfiles. |
15:57
🔗
|
Arkiver2 |
Hoped there was something new in there :/ |
15:58
🔗
|
SketchCow |
Nope, just someone blowing through the same collection I did, 2 years ago. |
15:58
🔗
|
SketchCow |
An hero |
15:59
🔗
|
antomatic |
Heh. I've got hundreds of old magazines nobody has scanned, anywhere. |
15:59
🔗
|
antomatic |
AAAAND all the coverdiscs and cover CDs too... |
16:00
🔗
|
antomatic |
Going back about 20 years |
16:00
🔗
|
antomatic |
Oh god, they take up so much room.. |
16:00
🔗
|
antomatic |
I have a problem. :( |
16:00
🔗
|
qwerty0 |
SketchCow: about google video, do you think you could find out where it went? |
16:21
🔗
|
SketchCow |
We grabbed metadata.\ |
16:22
🔗
|
qwerty0 |
Oh, I thought I remembered we handed over all the data for them to host. |
16:22
🔗
|
qwerty0 |
or, store, at least |
16:23
🔗
|
SketchCow |
I show just metadata. |
16:24
🔗
|
qwerty0 |
Damn. So where'd it go? I hope it wasn't discarded when they announced the Youtube migration feature. |
16:24
🔗
|
SketchCow |
google did a proper shutdown after we screamed |
16:25
🔗
|
SketchCow |
https://archive.org/details/google-video-metadata-dumpage |
16:27
🔗
|
qwerty0 |
It'd be a shame if it was lost, since a lot of videos never migrated and gv is toast now. |
16:27
🔗
|
SketchCow |
Well, good question what happened to the 18gb |
16:27
🔗
|
qwerty0 |
*TB? |
16:27
🔗
|
SketchCow |
Or tb. |
16:28
🔗
|
qwerty0 |
haha, good |
16:28
🔗
|
SketchCow |
I am not quite in the proper mood for this investigation. |
16:28
🔗
|
SketchCow |
First, please, do not talk as if this was the fall of rome. |
16:28
🔗
|
SketchCow |
We got Google to do a proper migration. |
16:29
🔗
|
SketchCow |
Second a lot of videos that didn't migrate basically failed the content filter. |
16:29
🔗
|
SketchCow |
Finally, there's no case where I or archive.org deleted data |
16:30
🔗
|
SketchCow |
Potentially, it got lost, maybe, but I doubt that. |
16:30
🔗
|
SketchCow |
But not if it went on archive.org. |
16:30
🔗
|
SketchCow |
18tb back then would definitely have been a major deal to put on archive.org. |
16:30
🔗
|
SketchCow |
We are not perfect. |
16:30
🔗
|
SketchCow |
We're better than we were and worse than we will be. |
16:31
🔗
|
SketchCow |
archivebot solved a lot. |
16:32
🔗
|
SketchCow |
Boy, I better get on top of this energy issue |
16:32
🔗
|
SketchCow |
I don't like losing hours |
16:33
🔗
|
SketchCow |
I can't even find a record of what we did with the google video. |
16:35
🔗
|
SketchCow |
https://archive.org/details/googlevideo2011 |
16:35
🔗
|
SketchCow |
Looks like IA crawled it off archive team metadata |
16:36
🔗
|
SketchCow |
Justice. Justice was served. |
16:37
🔗
|
DFJustin |
I still have 20gb worth from my googlegarge folder, I thought it was rsynced to you at some point |
16:37
🔗
|
DFJustin |
*googlegargle |
16:38
🔗
|
SketchCow |
There is a chance |
16:39
🔗
|
SketchCow |
A slight one, this is after all 3 years ago |
16:39
🔗
|
SketchCow |
That what I did was work with Kenji to have him crawl the videos out with IA and then I'd delete our copies |
16:40
🔗
|
SketchCow |
Video continues to be our weak point |
16:40
🔗
|
SketchCow |
One little maniac with a HD cacorder can film himself eating a bowl of captain crunch for 20 inutes and there's 5gb |
16:41
🔗
|
qwerty0 |
oops, connection problem |
16:41
🔗
|
joepie91 |
qwerty0: http://sebsauvage.net/paste/?17aacded4c8c1d77#wzItqU02Q/D1JJ5jL6Wjhva/6z0D4EdwDnfjhVPpRKM= |
16:42
🔗
|
DFJustin |
might it be in a noindex collection somewhere? I remember there was some copyright concerns with e.g. stage6 |
16:43
🔗
|
SketchCow |
Possibly |
16:43
🔗
|
SketchCow |
But I am fairly sure, as that was my first year with IA, that Alexis would have told me to make it wayback friendly |
16:43
🔗
|
qwerty0 |
joepie91: awesome, thanks |
16:43
🔗
|
SketchCow |
And possibly, that meant the swapover via Kenji |
16:43
🔗
|
SketchCow |
Since this precedes MegaWARC and archivebot |
16:45
🔗
|
qwerty0 |
SketchCow: okay, yeah, to be clear: I'm not looking for blame or anything, just trying to do some follow-up |
16:45
🔗
|
SketchCow |
I wouldn't let files get deleted. |
16:45
🔗
|
SketchCow |
But they're likely in wayback as links. |
16:46
🔗
|
qwerty0 |
SketchCow: cool, yeah, that's the last thing I'd assume you'd do. |
16:49
🔗
|
SketchCow |
I'd say "it's somewhere" |
16:50
🔗
|
SketchCow |
But you need to know that archive team has a class of materials that are of dubious accessibility |
16:50
🔗
|
SketchCow |
One of our intentions was to make it so there were much less of those going forward |
16:50
🔗
|
SketchCow |
And I think we did well. |
16:50
🔗
|
qwerty0 |
Yeah, I figured it was just a matter of surfacing it to the public. |
16:50
🔗
|
SketchCow |
We were working on an audit but people got bored/lost |
16:50
🔗
|
SketchCow |
It's tedious work and not as sexy for Our Fine Men |
16:50
🔗
|
qwerty0 |
Right, exactly. |
16:51
🔗
|
qwerty0 |
I know IA is just a group of people trying to do the best with a whole bunch of efforts. |
16:52
🔗
|
qwerty0 |
So, easy to believe it could fall through the cracks. |
16:52
🔗
|
DFJustin |
#archiveteam.EFNet.20120807.log:[11:04:34] <SketchCow> [2:03:22 PM] Kenji Nagahashi (Internet Archive): for stat lovers: % of video migrated from Google Video to YouTube: 11% |
16:53
🔗
|
qwerty0 |
Is that the final stat? |
16:53
🔗
|
DFJustin |
probably, that's about when they were closing |
16:53
🔗
|
SketchCow |
Sounds right |
16:53
🔗
|
DFJustin |
dunno what % IA grabbed but I would assume a lot if he's making statements like that |
16:54
🔗
|
SketchCow |
SO MUCH of Google Video was DVD MPEGs shoved into the system |
16:54
🔗
|
qwerty0 |
We estimated we got 40% |
16:54
🔗
|
qwerty0 |
http://archiveteam.org/index.php?title=Google_Video#A_Brief_History |
16:55
🔗
|
DFJustin |
the googlevideo2011 collection weighs in at 72.19 TB |
16:55
🔗
|
DFJustin |
much more than we got |
16:55
🔗
|
qwerty0 |
huh.. |
16:56
🔗
|
SketchCow |
So I'd say, for the moment, relax. |
16:56
🔗
|
SketchCow |
I'm not being a bummer or a burnout. |
16:56
🔗
|
qwerty0 |
Yeah, that's awesome, if they grabbed a ton of video. |
16:57
🔗
|
qwerty0 |
I don't even remember hearing they had a parallel effort. |
16:59
🔗
|
ersi |
They're sneaky, you know. |
17:03
🔗
|
qwerty0 |
In that case, maybe they determined our stuff was just duplicating what they had, and discarded it. |
17:22
🔗
|
VonCloud_ |
anyone here ever heard of Boeing Calc |
18:18
🔗
|
db48x |
hey guys |
18:19
🔗
|
db48x |
does anyone know if it's possible to get a warc for a single site out of the Wayback Machine? |
18:21
🔗
|
DFJustin |
not possible |
18:23
🔗
|
db48x |
I was afraid of that |
18:25
🔗
|
db48x |
I guess I can spider the archive... |
18:27
🔗
|
db48x |
is it possible to download any of the warcs it serves from? |
18:29
🔗
|
DFJustin |
only the archive team ones |
18:39
🔗
|
godane |
so i just found something interesting |
18:39
🔗
|
godane |
looks like even when robots.txt is blocking a url path images still can be viewed |
18:40
🔗
|
godane |
example: https://web.archive.org/web/20000407082332im_/http://www.msnbc.com/modules/tvnews/today/today_left.jpg |
18:46
🔗
|
db48x |
SketchCow: ouch, the audit didn't really get very far |
18:54
🔗
|
db48x |
SketchCow: how were we checking that our WARCs have been integrated into the Wayback Machine? |
19:12
🔗
|
db48x |
oops: https://archive.org/details/archiveteam_yahooblog |
19:26
🔗
|
xmc |
yes? |
19:33
🔗
|
db48x |
xmc: compare to https://archive.org/details/archiveteam_yahooblogs |
19:33
🔗
|
xmc |
oh |
19:39
🔗
|
SketchCow |
We just need to get shit right |
19:39
🔗
|
SketchCow |
If you want to help |
19:39
🔗
|
SketchCow |
Come to #auditteam |
19:43
🔗
|
VonGuard |
hey SketchCow |
19:43
🔗
|
VonGuard |
ever heard of Boeing Calc? |
20:10
🔗
|
SketchCow |
No |
20:48
🔗
|
brewskie1 |
Hey so, the current default project is in the all-claimed stage, and it seems like most others are too, what project is a good idea to work on? |
20:50
🔗
|
aaaaaaaaa |
qwiki discovery |
20:50
🔗
|
brewskie1 |
Alright, because that'll get left open for a day or so |
20:51
🔗
|
brewskie1 |
Just wanted to make sure it'd get its hour's worth |